稀疏表示保持的鑒別特征選擇算法

2015-09-16 21:52夏廣勝嚴(yán)慧

現(xiàn)代電子技術(shù) 2015年18期

夏廣勝　嚴(yán)慧

摘要：稀疏表示作為一種基于部分?jǐn)?shù)據(jù)的表示，已經(jīng)吸引了越來越多的關(guān)注，并廣泛應(yīng)用于模式識(shí)別和機(jī)器學(xué)習(xí)領(lǐng)域。提出一種新的算法，稱為稀疏表示保持的鑒別特征選擇（SRPFS），其目的是選擇鑒別性特征子集，使得在所選特征子空間中，樣本的稀疏類內(nèi)重構(gòu)殘差和稀疏類間重構(gòu)殘差的差值最小化。與傳統(tǒng)算法選擇特征的獨(dú)立性方式不同，該算法以批處理方式選擇最具鑒別性的特征，并用于優(yōu)化提出的l2，1范數(shù)最小化的目標(biāo)函數(shù)。在標(biāo)準(zhǔn)UCI數(shù)據(jù)集和哥倫比亞圖像數(shù)據(jù)庫的實(shí)驗(yàn)結(jié)果表明，該算法在識(shí)別性能和穩(wěn)定性方面優(yōu)于其他經(jīng)典特征選擇算法。

關(guān)鍵詞：特征選擇；稀疏表示；重構(gòu)殘差； l2，1范數(shù)

中圖分類號(hào)： TN911?34 文獻(xiàn)標(biāo)識(shí)碼： A 文章編號(hào)： 1004?373X（2015）18?0008?05

Abstract： A new algorithm for selection of distinguishable features preserved by sparse representation， whose aim is to select a subset of distinguishable features to minimize the difference value of reconstruction residual in sparse class and reconstruction residuals between sparse classes of samples in the subspace of selected features. The algorithm， which is different from the selection feature independence mode of the traditional algorithms， selects the most distinguishable features in batch mode and， is used to optimize the minimized objective function of l2，1?norm. The experimental results on standard UCI datasets and Columbia object image data base show that the algorithm is superior to other classic feature selection algorithms in the aspects of recognition performance and stability.

Keywords： feature selection； sparse representation； reconstruction residual； l2，1?norm

0 引言

特征選擇[1]用于從高維特征空間中選擇特征子集，并保持特征子集的原始物理特性，根據(jù)使用類別標(biāo)簽與否，特征選擇算法可分為非監(jiān)督和監(jiān)督兩種，本文主要研究監(jiān)督特征選擇算法。經(jīng)典的監(jiān)督特征選擇算法包括 ReliefF[2]， Fisher Score[3]以及多簇特征選擇（Multi?Cluster Feature Selection， MCFS）[4]等，它們通過特征和類別標(biāo)簽之間的相關(guān)性來度量特征的重要性，但是大多數(shù)傳統(tǒng)特征選擇算法對(duì)每個(gè)特征的度量是獨(dú)立進(jìn)行的[3，5]，并且將特征逐個(gè)添加至所選特征子空間，這種選擇方式的局限性在于特征之間的相關(guān)性被忽略[4]。最近，[l2，1]范數(shù)正則化優(yōu)化已經(jīng)應(yīng)用到特征選擇算法，此類算法通過對(duì)特征選擇矩陣進(jìn)行[l2，1]范數(shù)最小化約束來選擇特征[6?7]。

與此同時(shí)，稀疏表示作為一種基于部分?jǐn)?shù)據(jù)的表示，已經(jīng)吸引了越來越多的關(guān)注，并已廣泛應(yīng)用于模式識(shí)別和機(jī)器學(xué)習(xí)領(lǐng)域[8]。稀疏表示方法假設(shè)一個(gè)超完備字典中樣本的稀疏線性組合可以重構(gòu)一個(gè)給定的樣本，例如Wright等提出的基于稀疏表示的分類方法[9]（Sparse Representation?based Classification，SRC），該方法的優(yōu)化問題懲罰線性組合系數(shù)的[l1]范數(shù)，SRC嘗試使用所有訓(xùn)練樣本的稀疏線性組合來表示一個(gè)給定的測(cè)試樣本，并且認(rèn)為稀疏非零表示系數(shù)集中在測(cè)試樣本的同類訓(xùn)練樣本上。受到SRC的啟發(fā)，很多基于稀疏表示的特征抽取算法出現(xiàn)，例如文獻(xiàn)[10?11]提出的稀疏表示分類器引導(dǎo)的監(jiān)督特征抽取算法，該算法旨在減少類內(nèi)重構(gòu)殘差，并與此同時(shí)增加類間重構(gòu)殘差，但二者在目標(biāo)函數(shù)的形式上有所不同，文獻(xiàn)[10]采用比值方式文獻(xiàn)[11]采用差值方式。與特征選擇算法不同，特征抽取將原始特征進(jìn)行轉(zhuǎn)換從而實(shí)現(xiàn)數(shù)據(jù)降維，特征的原始物理特性發(fā)生變化。回顧經(jīng)典的監(jiān)督特征選擇算法，卻不存在與SRC直接關(guān)聯(lián)的，本文提出了一種稀疏表示保持的鑒別特征選擇（SRPFS）算法，旨在尋找一種線性映射使得在所選特征子空間中，樣本的稀疏類內(nèi)重構(gòu)殘差足夠小并且稀疏類間重構(gòu)殘差足夠大，并用于優(yōu)化提出的[l2，1]范數(shù)最小化的目標(biāo)函數(shù)。

1 基于稀疏表示的分類方法

3 實(shí) 驗(yàn)

在本節(jié)中，通過實(shí)驗(yàn)驗(yàn)證算法SRPFS的性能，首先將SRPFS與經(jīng)典的監(jiān)督特征選擇算法進(jìn)行比較，然后分析SRPFS的收斂性。

3.1 實(shí)驗(yàn)設(shè)置

4個(gè)公共數(shù)據(jù)集：Wine[16]，Breast Cancer Wisconsin （Diagnostic）[16]，Connectionist Bench （Sonar， Mines vs. Rocks）[16]以及 COIL20[17]，Wine， Breast Cancer Wisconsin和Connectionist Bench 來自標(biāo)準(zhǔn)UCI庫；來自哥倫比亞圖像數(shù)據(jù)庫的COIL20包含20個(gè)對(duì)象，數(shù)據(jù)集的描述在表1中給出。

3.2 分類識(shí)別率比較

對(duì)于每個(gè)數(shù)據(jù)集，隨機(jī)選擇每類樣本集的5種方法在4個(gè)數(shù)據(jù)集上的平均最高識(shí)別率（[±]std）的比較，如表2所示。選擇的樣本中80%做訓(xùn)練集，剩余樣本做測(cè)試集，為了證明不同算法的可靠性，將訓(xùn)練集以及測(cè)試集的選擇過程重復(fù)10次，All Features， Fisher Score， MCFS， ReliefF 以及 SRPFS 在4個(gè)數(shù)據(jù)集上的平均最高識(shí)別率及標(biāo)準(zhǔn)差在表2中給出，可以看出所有的特征選擇算法優(yōu)于All Features，因此，特征選擇算法有助于提高識(shí)別率，由于SRPFS中保持了樣本之間的稀疏相關(guān)性，SRPFS從識(shí)別率和穩(wěn)定性兩方面的性能明顯優(yōu)于其他方法。

3.3 收斂性

在本節(jié)中，通過實(shí)驗(yàn)證明所提出的迭代算法單調(diào)減小目標(biāo)函數(shù)值，直到收斂，圖1展示了式（12）中的目標(biāo)函數(shù)值在4個(gè)數(shù)據(jù)集上的收斂曲線圖，可以看出目標(biāo)函數(shù)在數(shù)次迭代后收斂。

4 結(jié) 語

在本文中，提出了一種新的監(jiān)督特征選擇算法，稱為稀疏表示保持的鑒別特征選擇（SRPFS），其目的是選擇鑒別性特征子集，使得在所選特征空間中，樣本的稀疏類內(nèi)重構(gòu)殘差和稀疏類間重構(gòu)殘差的差值最小化。通過實(shí)驗(yàn)驗(yàn)證SRPFS的性能并與其他4種方法即All Features， Fisher Score， MCFS，以及 ReliefF在4個(gè)公共數(shù)據(jù)集上進(jìn)行比較，實(shí)驗(yàn)表明SRPFS在識(shí)別率以及穩(wěn)定性方面明顯優(yōu)于其他方法。在未來，考慮將SRPFS的思想應(yīng)用到非監(jiān)督特征選擇算法研究中，由于不使用樣本的類別標(biāo)簽這將是一個(gè)更大的挑戰(zhàn)。

參考文獻(xiàn)

[1] KOLLER D， SAHAMI M. Toward optimal feature selection [C]// Proceedings of the 13th International Conference on Machine Learning. Bari， Italy： [s. n.]， 1996： 284?292.

[2] KONONENKO I. Estimating attributes： analysis and extensions of RELIEF [C]// Proceedings of the 1994 European Conference on Machine Learning. Catania， Italy： [s. n.]， 1994： 171?182.

[3] DUDA R O， HART P E， STORK D G. Pattern Classi?cation [M]. 2nd ed. New York， USA： John Wiley & Sons， 2001.

[4] CAI Deng， ZHANG Chiyuan， HE Xiaofei. Unsupervised feature selection for multi?cluster data [C]// Proceedings of the 2010 ACM Special Interest Group on Knowledge Discovery and Data Mining. Washington， USA： [s. n.]， 2010： 333?342.

[5] HE Xiaofei， CAI Deng， NIYOGI P. Laplacian score for feature selection [C]// Proceedings of Advances in Neural Information Processing Systems. Vancouver， Canada： [s. n.]， 2005： 231?236.

[6] YANG Yi， SHEN Hengtao， MA Zhigang， et al. [l2，1]?norm regularized discriminative feature selection for unsupervised learning [C]// Proceedings of the Twenty?Second International Joint Conference on Artificial Intelligence. Barcelona， Spain： [s. n.]， 2011： 1589?1594.

[7] NIE Feiping， HUANG Heng， CAI Xiao， et al. Efficient and robust feature selection via joint [l2，1]?norms minimization [C]// Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver， Canada： [s. n.]， 2010： 1?9.

[8] WANG J J Y， BENSMAIL H， YAO N， et al. Discriminative sparse coding on multi?manifolds [J]. Knowledge?Based Systems， 2013， 54： 199?206.

[9] WRIGHT J， YANG A Y， GANESH A， et al. Robust face recognition via sparse representation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2009， 31（2）： 210?227.

[10] YANG Jian， CHU Delin. Sparse representation class?er steered discriminative projection [C]// Proceedings of the 20th International Conference on Pattern Recognition. Istanbul， Turkey： [s. n.]， 2010： 694?697.

[11] LU Canyi， HUANG Deshuang. Optimized projections for sparse representation based classi?cation [J]. Neurocomputing， 2013， 113： 213?219.

[12] DONOHO D L. For most large underdetermined systems of linear equations the minimal [l1]?norm solution is also the sparsest solution [J]. Communications on Pure and Applied Mathematics， 2006， 59（6）： 797?829.

[13] CANDES E， ROMBERG J， TAO T. Stable signal recovery from incomplete and inaccurate measurements [J]. Communications on Pure and Applied Mathematics， 2006， 59（8）： 1207?1223.

[14] CHEN S S B， DONOHO D L， SAUNDERS M A. Atomic decomposition by basis pursuit [J]. Society for Industry and Applied Mathematics Review， 2001， 43（1）： 129?159.

[15] DONOHO D L， TSAIG Y. Fast solution of [l1]?norm minimization problems when the solution may be sparse [J]. IEEE Transactions on Information Theory， 2008， 54（11）： 4789?4812.

[16] Author unknown. UCI Donald Bren School of Information & Computer Sciences[EB/OL]. [2015?03?04]. http：//www.ics.uci.edu/.

[17] Author unknown.Other standard data sets in matlab format[EB/OL].[2015?03?04]. http：//www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.

[18] BECK A， TEBOULLE M. A fast iterative shrinkage?thresholding algorithm for linear inverse problems [J]. SIAM Journal on Imaging Sciences， 2009， 2（1）： 183?202.

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

稀疏表示保持的鑒別特征選擇算法