国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

半監(jiān)督學(xué)習(xí)算法拉普拉斯支持向量機(jī)應(yīng)用于蛋白質(zhì)結(jié)構(gòu)類預(yù)測(cè)

2020-09-02 07:14吳疆董婷蔣平
微型電腦應(yīng)用 2020年8期
關(guān)鍵詞:分類器向量樣本

吳疆 董婷 蔣平

摘要:

應(yīng)用半監(jiān)督學(xué)習(xí)方法拉普拉斯支持向量機(jī)(Laplace Support Vector Machine, LapSVM)對(duì)蛋白質(zhì)結(jié)構(gòu)類進(jìn)行預(yù)測(cè)。首先7個(gè)氨基酸理化性質(zhì)參數(shù)作為替代模型將蛋白質(zhì)序列轉(zhuǎn)換為數(shù)字序列,自協(xié)方差變換(AutocrossCovariance, AC)用來描述具有一定間隔氨基酸殘基之間的相互關(guān)系并將數(shù)字序列變換為統(tǒng)一長(zhǎng)度的向量,構(gòu)建樣本的特征空間。然后在數(shù)據(jù)集中分別隨機(jī)挑選20、50、80、110、140、170個(gè)樣本作為無標(biāo)簽樣本構(gòu)建訓(xùn)練集,一對(duì)多分解策略和留一法用來評(píng)價(jià)LapSVM模型的預(yù)報(bào)能力。分類器對(duì)蛋白質(zhì)樣本類預(yù)測(cè)正確率為94.12%,與標(biāo)準(zhǔn)支持向量機(jī)算法(Support Vector Machine, SVM)方法90.69%的預(yù)測(cè)精度相比有明顯的競(jìng)爭(zhēng)力。實(shí)驗(yàn)結(jié)果有效驗(yàn)證了無標(biāo)簽樣本的分布信息作為弱規(guī)則能有效提升分類器的預(yù)報(bào)性能。同時(shí)提供了一種新穎的思路,應(yīng)用半監(jiān)督方法解決全監(jiān)督學(xué)習(xí)問題,更小的優(yōu)化規(guī)模,更好的預(yù)報(bào)能力。

關(guān)鍵詞:

半監(jiān)督學(xué)習(xí); 蛋白質(zhì)結(jié)構(gòu)類; 拉普拉斯支持向量機(jī); 自協(xié)方差變換

中圖分類號(hào): TP 391

文獻(xiàn)標(biāo)志碼: A

Protein Structural Classes Prediction by Using Laplace Support

Vector Machine and Based on Semisupervised Method

WU Jiang1, DONG Ting1, JIANG Ping1,2

(1. Department of Information Engineering ,Yulin University, Yulin, Shanxi ?719000, China;

2. School of Computer Science and Technology, Xidian University, Xian, Shanxi 710071, China)

Abstract:

The purpose of the study is to predict protein structural classes by using Laplace support vector machine (LapSVM) which is a novel semisupervised learning method. Firstly, seven amino acid physicochemical properties cited from literature was applied to transform the protein sequences into numeric vectors, and auto covariance (AC) was used in transforming the physicochemical properties of the amino acids of given proteins into features space with the same size, which is suitable for training models. AC focuses on the neighboring effects and the interactions between residues with a certain distance apart in protein sequences. Secondly, 20, 50, 80, 110, 140 and 170 samples were randomly selected as unlabelled samples to construct training datasets, “oneagainstall” strategy and leaveoneout method were employed to estimate the performance. The prediction accuracy 94.12% was obtained, and it is very promising compared with the accuracy 90.69% predicted by Support Vector Machine (SVM). The experimental results proofed that the unlabelled samples input as weak rules can lightly improve the prediction performances, simultaneously, a novel idea is using semisupervised method to solve a supervised learning problem intends to less optimal scale and higher prediction accuracy.

Key words:

semisupervised learning; protein structural class; Laplace support vector machine; auto correlation

猜你喜歡
分類器向量樣本
向量的分解
學(xué)貫中西(6):闡述ML分類器的工作流程
基于AdaBoost算法的在線連續(xù)極限學(xué)習(xí)機(jī)集成算法
一種統(tǒng)計(jì)分類方法的學(xué)習(xí)
直擊高考中的用樣本估計(jì)總體
隨機(jī)微分方程的樣本Lyapunov二次型估計(jì)
向量垂直在解析幾何中的應(yīng)用
向量五種“變身” 玩轉(zhuǎn)圓錐曲線
基于支持向量機(jī)的測(cè)厚儀CS值電壓漂移故障判定及處理
基于支持向量機(jī)的蛋白質(zhì)交互界面熱點(diǎn)的預(yù)測(cè)的研究與改進(jìn)