国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

基因組時(shí)代線粒體基因組拼裝策略及軟件應(yīng)用現(xiàn)狀

2019-11-28 11:59:18匡衛(wèi)民于黎
遺傳 2019年11期
關(guān)鍵詞:線粒體基因組測序

匡衛(wèi)民,于黎

基因組時(shí)代線粒體基因組拼裝策略及軟件應(yīng)用現(xiàn)狀

匡衛(wèi)民,于黎

云南大學(xué)生命科學(xué)學(xué)院,省部共建生物資源保護(hù)與利用國家重點(diǎn)實(shí)驗(yàn)室,昆明 650091

隨著測序技術(shù)的不斷發(fā)展,越來越多物種的全基因組數(shù)據(jù)被測定和廣泛應(yīng)用。在二代基因組數(shù)據(jù)爆發(fā)式增長的同時(shí),除了核基因組數(shù)據(jù),線粒體基因組數(shù)據(jù)也非常重要。高通量測序的全基因組序列中除了核基因組序列也包括線粒體基因組序列,如何從海量的全基因組數(shù)據(jù)中提取和拼裝線粒體基因組序列并加以應(yīng)用成為線粒體基因組在分子生物學(xué)、遺傳學(xué)和醫(yī)學(xué)等方面的研究方向之一?;诖耍瑥娜蚪M數(shù)據(jù)中提取線粒體基因組序列的策略及相關(guān)的軟件不斷發(fā)展。根據(jù)從全基因組數(shù)據(jù)中錨定線粒體reads的方式和后續(xù)拼裝策略的不同,可以分為有參考序列拼裝方法和從頭拼裝方法,不同拼裝策略及軟件也表現(xiàn)出各自的優(yōu)勢和局限性。本文總結(jié)并比較了當(dāng)前從全基因組數(shù)據(jù)中獲得線粒體基因組數(shù)據(jù)的策略和軟件應(yīng)用,并對使用者在使用不同策略和相關(guān)軟件方面給予建議,以期為線粒體基因組在生命科學(xué)的相關(guān)研究中提供方法上的參考。

全基因組;線粒體基因組;有參考序列拼裝方法;從頭拼裝方法;拼裝軟件

線粒體基因組(mitochondrial genome)作為一種特殊且容易獲取的遺傳標(biāo)記,因具有高突變速率、無基因重組、高拷貝數(shù)和母系遺傳等特點(diǎn)[1],被廣泛應(yīng)用在系統(tǒng)發(fā)育和生物地理研究[2~5]、群體遺傳[6~13]、醫(yī)學(xué)[14~17]和生態(tài)學(xué)研究[18~20]等領(lǐng)域。在早期的研究階段,線粒體基因組序列的獲取是首先通過長鏈鏈?zhǔn)椒磻?yīng)(long range PCR, LR-PCR)和克隆PCR擴(kuò)增,然后再通過引物步移(primer walking)桑格(Sanger)測序。這種方法準(zhǔn)確性高,但通量低、耗時(shí)耗力和花費(fèi)高。隨著測序技術(shù)的發(fā)展,特別是新一代測序技術(shù)(next-generation sequencing, NGS)的發(fā)展及測序成本的快速下降,使得線粒體基因組序列的獲取變得更為容易。目前,NGS及其衍生技術(shù)(如LR- PCR加NGS、RNA測序加缺口填補(bǔ)(gap filling)和直接鳥槍法測序[21~23]等)使得高通量測序成為普遍現(xiàn)象。相比傳統(tǒng)的Sanger測序技術(shù),NGS技術(shù)通量高、可以更快速且用更低的花費(fèi)獲得全基因組序列(whole- genome sequencing, WGS)、外顯子序列和基因轉(zhuǎn)錄本[24]。新一代測序技術(shù)的基本原理是:測序平臺(tái)對樣本總DNA或分離純化后的線粒體DNA隨機(jī)打斷成50~700 bp的單鏈DNA文庫(DNA長短取決于文庫構(gòu)建平臺(tái)),并將短片段的兩端與測序接頭序列連接起來,然后對產(chǎn)生的幾百萬條的DNA分子進(jìn)行測序,高效、準(zhǔn)確、快速地獲得大量DNA序列,最后通過生物信息分析從海量的全基因組數(shù)據(jù)中獲取線粒體基因組。近年來,以Pacific Biosciences (PacBio)和 Oxford Nanopore單分子測序技術(shù)為代表的第三代測序技術(shù)飛速發(fā)展,其測序過程無需進(jìn)行DNA隨機(jī)打碎和PCR擴(kuò)增,并且讀長增加到幾十kb,甚至到100 kb,拼裝后得到更高質(zhì)量的全基因組序列?;蚪M技術(shù)的發(fā)展也促使線粒體序列數(shù)據(jù)爆發(fā)式地增加。因此,越來越多的研究者嘗試采用多個(gè)不同的策略從WGS數(shù)據(jù)中獲取線粒體基因組[23,25~39]。

在NGS時(shí)代如何高效分離和富集線粒體DNA而避免核DNA的污染是線粒體基因組測序及后續(xù)分析的關(guān)鍵,目前主要包括兩種分離策略:(1)在NGS測序前,從總DNA中物理分離純化線粒體DNA。這種策略先通過氯化銫密度梯度離心/差速離心或者試劑盒富集磁珠將核DNA和線粒體DNA分離[40,41],然后將分離純化后的線粒體DNA進(jìn)行文庫構(gòu)建和高通量測序。這樣,通過在NGS測序前就將核DNA和線粒體DNA (或葉綠體DNA)分離,以保證獲得的數(shù)據(jù)是來自于線粒體(或葉綠體)。該方法的優(yōu)勢在于避免了核DNA的污染,即線粒體序列轉(zhuǎn)移到核基因的序列(nuclear mitochondrial pseudogenes, Numts[42])。但是,物理分離純化的方法所用的試劑盒價(jià)格昂貴、操作比較繁瑣和耗時(shí)耗力、對樣品的質(zhì)量和數(shù)量也都有一定的要求,因此目前仍然存在許多挑戰(zhàn)[43,44],特別是在珍稀野生保護(hù)動(dòng)物和古DNA (ancient DNA, aDNA)的研究領(lǐng)域則更為困難。(2)先進(jìn)行PCR擴(kuò)增,對擴(kuò)增產(chǎn)物進(jìn)行NGS測序。該策略是先用引物擴(kuò)增出線粒體基因組目的片段,再將擴(kuò)增產(chǎn)物直接上機(jī)進(jìn)行NGS測序,無需構(gòu)建DNA文庫[45]。該方法的優(yōu)勢在于需要的起始DNA樣本量少,特別適合小型昆蟲和環(huán)境DNA研究領(lǐng)域,關(guān)鍵在于模板DNA的質(zhì)量和PCR引物的特異性。

NGS數(shù)據(jù)被廣泛應(yīng)用在生命科學(xué)的很多領(lǐng)域,尤其是在進(jìn)化生物學(xué)、群體遺傳學(xué)等揭示物種的起源和擴(kuò)散歷史方面發(fā)揮了重要的作用。研究者們常常發(fā)現(xiàn)核基因數(shù)據(jù)和線粒體數(shù)據(jù)表現(xiàn)出不一致的譜系關(guān)系,特別是具有復(fù)雜的群體歷史的類群(比如基因交流、遺傳漂變、偏向性遷徙和祖先譜系分揀等)??梢姡诜治鯪GS數(shù)據(jù)時(shí),除了核基因組數(shù)據(jù)外,線粒體基因組數(shù)據(jù)也非常重要。然而,目前通過NGS方法獲得的全基因組數(shù)據(jù)中即包括了線粒體基因組數(shù)據(jù)和核基因組數(shù)據(jù)。在全基因組數(shù)據(jù)中,雖然與核基因reads的測序深度相比,線粒體reads的測序深度是核基因的100~1000倍(細(xì)胞中存在幾十到數(shù)百個(gè)拷貝)[46],但是線粒體基因組總的reads數(shù)量只占總WGS的reads很少一部分,而且常常受到核基因和葉綠體(綠色植物) reads的污染。因此,使用高效的生物信息工具和分析策略從海量的全基因組數(shù)據(jù)中快速準(zhǔn)確地獲得線粒體基因組reads并完整準(zhǔn)確地進(jìn)行后續(xù)線粒體基因組拼裝就顯得非常重要[36]。本文將總結(jié)當(dāng)前常用的從WGS數(shù)據(jù)中獲取線粒體基因組序列的拼裝策略及相關(guān)軟件,并對使用者在使用不同策略和相關(guān)軟件方面給予建議。

1 有參考序列拼裝策略及軟件應(yīng)用

有參考序列拼裝策略需要選擇近緣物種的線粒體基因組或部分片段作為參考序列從研究類群的全基因組數(shù)據(jù)中捕獲線粒體reads。根據(jù)從WGS數(shù)據(jù)中捕獲線粒體reads是否需要完整的線粒體基因組作為參考序列,目前常用的策略可以分為:(1)基于線粒體整個(gè)基因組的拼裝策略;(2)基于線粒體片段的拼裝策略[47,48](圖1)。在數(shù)據(jù)分析流程上,首先使用全基因組比對工具(如BWA[49])將總reads映射(mapping)到線粒體參考序列上,根據(jù)序列的相似性捕獲線粒體reads,然后再使用不同的序列延長策略對捕獲到的線粒體reads進(jìn)行序列延伸,直到延長到完整的線粒體基因組長度。

1.1 基于線粒體基因組拼裝策略及軟件應(yīng)用

基于線粒體基因組作為參考序列獲取物種或群體的線粒體基因組序列的方法被廣泛應(yīng)用在系統(tǒng)發(fā)育和群體遺傳學(xué)研究。如Ko等[50]將現(xiàn)存大熊貓的線粒體基因組作為參考序列,獲取到一個(gè)2.2萬年前大熊貓的線粒體基因組。其原理是根據(jù)同源比對的研究方法,將WGS數(shù)據(jù)映射到近緣物種的線粒體基因組上,再根據(jù)線粒體reads間相互重疊情況,從而完成序列的延長(圖1)。這種方法較容易獲取和參考基因組一致的序列(consensus sequence),并且準(zhǔn)確性高,運(yùn)算速度較快且不耗計(jì)算資源。

隨著測序技術(shù)的發(fā)展,對數(shù)據(jù)分析能力的需求也在增加,特別是人類線粒體基因組研究領(lǐng)域,包括人類進(jìn)化歷史、人類線粒體疾病等方面的研究[51,52],推動(dòng)了人類線粒體基因組的拼裝和注釋相關(guān)軟件的發(fā)展(表1)。MIA是較早用于人類線粒體基因組拼裝的軟件,研究者對尼安德特古人類骨頭提到的DNA進(jìn)行高通量測序后,用現(xiàn)代人的線粒體基因組作為參考序列,使用該軟件獲取到尼安德特古人類的線粒體基因組[53]。隨著人類線粒體基因組數(shù)據(jù)的不斷累積和研究領(lǐng)域的不斷擴(kuò)大,對數(shù)據(jù)分析能力和軟件的功能提出了新要求。一些網(wǎng)絡(luò)或windows圖形用戶界面的軟件被廣泛使用,包括MitoBamAnno-tator[54]、MitoSeek[55]、mtDNA-profiler[56]、mit-o-matic[57]、MToolBox[58]、Phy-Mer[59]、mtDNA-Server[60]和Mito-Suite[61]等。這類軟件支持多種輸入文件格式,除了mtDNA-profiler和mit-o-matic外,其他軟件都支持二進(jìn)制的Bam格式文件。因此,這些軟件可以直接讀取不同軟件的輸出數(shù)據(jù),加快了整個(gè)分析流程。值得注意的是,各種軟件供用戶選擇的參考基因組數(shù)量有差異,如MitoBamAnnotator、mtDNA-profiler和mit-o-matic僅提供了1套人類基因組(rCRS),MitoSeek (rCRS, hg19)、mtDNA-Server (rCRS, RSRS)和MToolBox (rCRS, RSRS)提供了2套基因組數(shù)據(jù),而MitoSuite提供了5套人類參考基因組(rCRS、RSRS、hg19、GRCh37和38)。使用Phy-Mer軟件,用戶可以自定義參考基因組序列。此外,通過Mito-BamAnnotator、MitoSeek、MToolBox、mtDNA- Server、mit-o-matic和MitoSuite軟件,用戶可以設(shè)置相應(yīng)參數(shù)(比如最小等位基因頻率,MAF)來檢測線粒體基因組的變異位點(diǎn)和異質(zhì)性位點(diǎn)(heteropla-smic sites, 即線粒體基因組序列上同一個(gè)位置存在兩種及兩種以上的堿基類型,來源可能是外源污染,包括測序錯(cuò)誤、特異性擴(kuò)增,reads匹配錯(cuò)誤等,也可能是內(nèi)源線粒體異質(zhì)體)。MitoBamAnnotator主要評估和預(yù)測線粒體異質(zhì)性位點(diǎn)潛在的功能,但使用功能比較單一。MitoSeek 和MToolBox擴(kuò)展了分析功能,包括線粒體拷貝數(shù)目、比對質(zhì)量、結(jié)構(gòu)變異檢測等功能。MitoSeek還可以借助Circos[62]軟件對檢測出的變異進(jìn)行可視化,包括基因結(jié)構(gòu)變異(structural variations, SVs)和單核苷酸變異(single nucleotide polymorphism, SNPs)。MToolBox優(yōu)勢在于可以單次分析多個(gè)個(gè)體,并且將變異信息記錄到VCF文件中,更容易被解析和注釋。從用戶操作運(yùn)行方面比較,MitoSeek和MToolBox是一款基于Perl編程語言的Linux運(yùn)算環(huán)境,并且需要加載多個(gè)獨(dú)立的Perl模塊和比對軟件(BWA)以及變異檢測軟件(GATK[63]),對于非生物信息研究背景的用戶安裝和使用這類軟件相對較困難。mtDNA-Server和mit-o-matic軟件是網(wǎng)絡(luò)用戶圖形分析工具,用戶不需要復(fù)雜的安裝過程,僅通過注冊的郵箱后上傳數(shù)據(jù)并進(jìn)行分析,操作和數(shù)據(jù)分析相對簡單,缺點(diǎn)是受輸入文件大小的限制,特別是高測序深度的個(gè)體上傳數(shù)據(jù)較緩慢。近期開發(fā)的MitoSuite軟件擴(kuò)展了更多實(shí)用功能,功能更強(qiáng)大,包括人類線粒體基因組的拼裝、變異檢測、疾病變異注釋和功能預(yù)測、拷貝數(shù)目、質(zhì)量檢測和覆蓋度的可視化等。MitoSuite相比于其他早期的軟件,不需要安裝其他復(fù)雜的計(jì)算模塊,是圖形化操作系統(tǒng)且能本地運(yùn)行的一款容易操作的軟件,可以直接從Bam文件中自動(dòng)建立一致性序列后進(jìn)行系統(tǒng)發(fā)育或群體遺傳學(xué)的研究[61],所以對于人類線粒體基因組的研究領(lǐng)域,選擇MitoSuite更具有優(yōu)勢。

圖1 從全基因組測序數(shù)據(jù)中獲得及拼裝線粒體基因組策略

分析流程圖根據(jù)參照文獻(xiàn)[36,47,66]修改繪制。實(shí)線框代表全基因數(shù)組短reads序列;虛線框代表獲取線粒體基因組序列的方法。

表1 線粒體基因組拼裝軟件信息

按拼裝軟件發(fā)表時(shí)間先后順序排列?!啊獭北硎究梢詫?shí)現(xiàn)的功能;“×”表示不可以實(shí)現(xiàn)的功能;GUI:圖形用戶界面;CUI:命令行運(yùn)行界面;Web (web server):網(wǎng)絡(luò)圖形用戶界面。

綜上所述,使用上述方法及相關(guān)軟件從全基因組數(shù)據(jù)中獲取線粒體基因組序列,首先借助全基因組比對軟件,包括常用的BWA和Bowtie/Bowtie2[64],將從總reads中捕獲到線粒體基因組reads。這兩種比對軟件優(yōu)勢在于可以對reads錯(cuò)配或reads多處匹配進(jìn)行篩選和過濾,通過后續(xù)的質(zhì)控獲取到純凈的線粒體reads。但是,無法區(qū)分Numts和線粒體拷貝數(shù),從而影響線粒體異質(zhì)性的檢測。另外,這些方法及相關(guān)軟件需要選擇近緣物種的線粒體基因組參考序列,如果選擇進(jìn)化關(guān)系較遠(yuǎn)的物種的線粒體基因組作為參考序列,在全基因組比對的過程中可能會(huì)發(fā)生reads錯(cuò)配或者因序列分歧大導(dǎo)致部分區(qū)域比對不上而出現(xiàn)缺失數(shù)據(jù)(gap),從而影響到后續(xù)線粒體基因組拼裝的準(zhǔn)確性和完整性[38]。因此,選擇合適物種的線粒體基因組作為參考序列是該方法和軟件應(yīng)用的關(guān)鍵。對于要研究的物種無法確定其近緣物種,或者是確定了其近緣物種但沒有已有線粒體基因組數(shù)據(jù)的情況下,這個(gè)方法就有很大的局限性[36,39]。

1.2 基于線粒體片段拼裝策略及軟件應(yīng)用

上述借助近緣物種的線粒體全基因組作為參考序列的拼裝策略及相關(guān)的軟件多數(shù)適用于人的線粒體基因組拼裝、變異檢測和變異注釋等。隨著越來越多其他物種的研究,線粒體基因組分析也被廣泛應(yīng)用在非模式物種的研究中[65]。僅用人的基因組作為參考序列的軟件來獲取和分析其他物種的線粒體基因組序列就表現(xiàn)出很大的局限性,因此迫切需要開發(fā)適用范圍更廣的線粒體基因組拼裝軟件。與總reads直接映射到線粒體基因組參考序列的拼裝策略類似,但可以選擇遺傳關(guān)系較遠(yuǎn)或較近物種的線粒體基因組,甚至線粒體部分序列,來進(jìn)行其它物種的線粒體基因組序列獲取和拼裝。該方法首先借助全基因組比對軟件將過濾后的WGS數(shù)據(jù)映射到參考序列上,高覆蓋度且連續(xù)的線粒體reads組成序列塊(bins),這些單獨(dú)的bins或者根據(jù)bins重疊情況連接成Contigs替換原先的參考序列,并作為下次映射的靶序列(baiting sequencing),依次反復(fù)將WGS數(shù)據(jù)映射到新生成的靶序列上延長序列,最后延長到完整的線粒體基因組長度(圖1)。反復(fù)映射和替換靶序列可以避免參考序列和拼裝方法的偏好性。拼裝過程中需要調(diào)整Kmer值(拼裝過程中reads打斷成長度為K的一段固定核苷酸序列)大小,反復(fù)將WGS數(shù)據(jù)映射到靶序列上進(jìn)行序列延長,因此需要消耗大量的計(jì)算資源,原始數(shù)據(jù)越大越消耗計(jì)算資源。如果選擇遺傳關(guān)系越遠(yuǎn)的物種或選擇的靶序列越短,拼裝時(shí)的序列延長則需要更多的循環(huán)次數(shù),計(jì)算時(shí)間也就越長。

Hahn等[66]開發(fā)的MITObim軟件可以直接從WGS數(shù)據(jù)中拼裝非模式物種的線粒體基因組,這個(gè)軟件嵌入了MIRA和IMAGE計(jì)算模塊。相比MIA,MITObim的準(zhǔn)確性可以達(dá)到99.5%以上,在重復(fù)區(qū)域可以有效的填補(bǔ)gap,計(jì)算速度和內(nèi)存消耗也占有優(yōu)勢,成為目前最廣泛使用的線粒體基因組拼裝軟件。該軟件不支持雙端序列(paired-end reads, PE reads),支持Iontorrent、454和PacBio測序平臺(tái)數(shù)據(jù),而且建議原始數(shù)據(jù)reads數(shù)量不要超過20~40百萬條。如果超出,建議從原始reads中隨機(jī)抽取部分reads,這樣就減少reads的數(shù)量,不過這樣可能會(huì)影響拼裝結(jié)果的準(zhǔn)確性和完整性。當(dāng)然,MITObim也無法解決線粒體基因組拼裝中一些尤為復(fù)雜的問題,如Numts、復(fù)雜的無脊椎動(dòng)物和植物的線粒體拼裝等[67]。ARC[47]軟件的拼裝過程類似于MIT-Obim軟件,兩者都可以選擇親緣關(guān)系較遠(yuǎn)的物種的線粒體基因組或者線粒體部分序列就可以得到完整的線粒體基因組序列,主要的差異在于序列延長方式。ARC是直接對bins進(jìn)行拼裝完成序列的延長,而MITObim則是反復(fù)將總reads往靶序列上映射完成延長序列。相比其他全基因組拼裝軟件,ARC不是將總reads進(jìn)行從頭拼裝,而是先通過映射的方式對reads重疊的bins進(jìn)行拼裝,優(yōu)勢在于不耗內(nèi)存,運(yùn)行速度較快。此外,ARC基本上不受降解嚴(yán)重的DNA質(zhì)量和低質(zhì)量的reads的影響,特別是aDNA,而且運(yùn)算速度比MITObim和傳統(tǒng)的拼裝方法快[47]。Li等[68]使用ARC軟件對19個(gè)隱桿線蟲()物種進(jìn)行線粒體基因組拼裝,測試了不同測序平臺(tái)(Roche、454、Illumina和Ion Torrent)對線粒體基因組拼裝的影響,結(jié)果發(fā)現(xiàn)ARC軟件對454平臺(tái)的數(shù)據(jù)進(jìn)行分析時(shí)會(huì)崩潰,可能的原因是序列長度范圍大導(dǎo)致數(shù)據(jù)分析需要較大的計(jì)算資源。但是ARC拼裝的完整性都要比MITObim好。然而,Dierckxsens等[47]用ARC軟件對角脛葉甲屬()進(jìn)行線粒體基因組拼裝,結(jié)果發(fā)現(xiàn)盡管ARC準(zhǔn)確性高(99.99%),但不能將線粒體拼裝到一條Contig上,完整性較差(覆蓋到線粒體基因組的85.39%)。

Dierckxsens等[38]開發(fā)了NOVOPlasty軟件,類似于SSAKE[69]和VCAKE[70]算法,將排序后的reads存放在哈希表中,以便reads的快速讀取,因此運(yùn)算速度較快。NOVOPlasty軟件需要提供一條靶序列,可以是一條短read、一段編碼基因序列,甚至是完整的線粒體基因組序列。值得注意的是,NOVO-Plasty與ARC拼裝策略不同的是,NOVOPlasty借助提供的靶序列從WGS數(shù)據(jù)中獲取線粒體基因組的一條read,然后再對捕獲到的read進(jìn)行雙向延伸。作者將NOVOPlasty與當(dāng)前主流的拼裝軟件相比較,包括MITObim、MIRA、ARC、SOAPdenvo2和CLCbio,結(jié)果發(fā)現(xiàn):除了ARC外,其余軟件都將線粒體拼裝在一條Contig。通過對NOVOPlasty拼裝到的序列進(jìn)行質(zhì)量評估,沒有發(fā)現(xiàn)缺失位點(diǎn)和不確定的堿基位點(diǎn),表明準(zhǔn)確性和完整性高。NOVOPlasty的計(jì)算速度最快、基因組覆蓋度最高,CLCbio準(zhǔn)確性同樣也達(dá)到了100%,但是基因組的覆蓋度不高(89.96%)。MIRA和ARC都體現(xiàn)最高的基因組覆蓋度,但是準(zhǔn)確性最低。增加測序覆蓋度和reads的長度可以提高NOVOPlasty的完整性和準(zhǔn)確性,特別是高重復(fù)和AT含量高的區(qū)域。NOVOPlasty運(yùn)行不需要載入其他軟件和模塊,對于用戶來說安裝和操作比較簡單[38]。

目前用于葉綠體基因組拼裝軟件同樣適合線粒體基因組的拼裝,包括IOGA[71]、GetOrganelle[72]和ORG.Asm[73]等。IOGA和GetOrganelle類似于MITObim 中的“Baiting and iterative 映射”分析流程。IOGA分析過程需要Bowtie2、SOAPdenovo2、SPAdes 3.0[37]和其他程序來捕獲線粒體reads,拼裝過程還需要調(diào)整拼裝參數(shù)Kmer大小(范圍為37~97),最后通過拼裝似然評估(assembly likelihood estimation, ALE)從候選的Contigs序列里確定線粒體基因組[74]。這種方法適合降解程度較大的樣品的線粒體基因組或葉綠體基因組拼裝,比如博物館樣品等。與其他拼裝軟件比較,IOGA使用ALE檢驗(yàn)來篩選拼裝好的Contigs,最后通過最大似然值來判斷最優(yōu)的拼裝序列。GetOrganelle和IOGA數(shù)據(jù)分析流程非常相似。GetOrganelle嵌入了獨(dú)立的Bowtie2、BLAST[75]和SPAdes 3.0分析模塊,雙端reads和單端reads (single- end reads,SE reads)均可以作為GetOrganelle的輸入文件。GetOrganelle可以直接在SPAdes拼裝的過程中進(jìn)行reads錯(cuò)誤矯正和錯(cuò)配過濾,保留高質(zhì)量的reads作為后續(xù)分析,而IOGA和MITObim則需要用其他過濾軟件提前進(jìn)行低質(zhì)量reads的過濾。IOGA和GetOrganelle拼裝軟件均嵌入SPAdes程序計(jì)算模塊,在拼裝過程中需要反復(fù)調(diào)試Kmer值的大小。選擇合適的Kmer不僅能夠保證線粒體Scaffolds或Contigs的完整性和準(zhǔn)確性,還可以減少計(jì)算時(shí)間和運(yùn)行內(nèi)存[72]。

最近,隨著單分子測序PacBio和Nanopore長片段測序技術(shù)的發(fā)展,一些復(fù)雜物種的全基因組序列被測序和應(yīng)用,特別是多倍體物種和高重復(fù)的物種,顯示了長片段測序技術(shù)的優(yōu)勢[27,76~80]。同時(shí),已經(jīng)開發(fā)出了一些適用于拼裝PacBio和Nanopore長reads的軟件,比如HGAP[81]、Falcon (https:// github.com/PacificBiosciences/falcon)、Canu[82]和Sprai[83]等,而從這些平臺(tái)測序得到的長reads進(jìn)行線粒體和葉綠體基因組拼裝的方法和算法還很缺乏。目前已經(jīng)有一些研究者直接使用PacBio和Nanopore平臺(tái)進(jìn)行線粒體基因組測序并進(jìn)行拼裝[25~29]。Soorni等[84]基于Perl編程語言開發(fā)的Organelle-PBA直接對PacBio平臺(tái)測序到的全基因組長片段進(jìn)行線粒體或葉綠體基因組的拼裝。Organelle-PBA安裝和使用需要安裝多種Perl模塊和多種軟件,包括BlasR[85]、Samtools[86]、Blast[87]、SSPACE-LongRead[88]、Sprai和BEDTools[89]等。雖然PacBio和Nanopore測序平臺(tái)可以得到更長的reads,但是仍然存在一定的堿基錯(cuò)誤率,因此需要使用堿基矯正軟件進(jìn)行堿基矯正,比如Sprai。因PacBio和Nanopore測序平臺(tái)不需要在建庫的過程中進(jìn)行DNA隨機(jī)打斷和擴(kuò)增并且具有讀長長特點(diǎn),所以可以完整得將線粒體基因組一次性測通,有效避免了Numts的污染。但同時(shí)因?yàn)镻acBio和Nanopore測序平臺(tái)對樣品DNA質(zhì)量有極其嚴(yán)格的要求,要保證DNA的完整性,所以O(shè)rganelle- PBA的使用也有局限性。

2 從頭(de novo)拼裝策略及軟件應(yīng)用

目前,世界上越來越多的物種的全基因組數(shù)據(jù)和線粒體基因組數(shù)據(jù)被公布,但也有絕大多數(shù)物種的基因組信息還未被測定,針對沒有參考基因組序列的物種,從頭拼裝是一種快速和準(zhǔn)確地獲取遺傳信息的策略,這種方法被廣泛應(yīng)用在DNA和RNA序列拼裝。線粒體基因組的從頭拼裝與核基因組的拼裝過程相似,首先從海量的全基因組數(shù)據(jù)中找到短reads的一致性序列,然后再根據(jù)不同長度的大片段文庫進(jìn)行Contigs的排序和連接,最后延長到Scaffolds水平。根據(jù)線粒體reads的來源不同,可以分為從全基因組數(shù)據(jù)中從頭拼裝線粒體基因組策略和從轉(zhuǎn)錄組數(shù)據(jù)中從頭拼裝線粒體基因組策略(圖1)。

2.1 從全基因組數(shù)據(jù)中從頭拼裝線粒體基因組策略及軟件應(yīng)用

從頭拼裝線粒體基因組方法不需要提供完整的線粒體基因組或線粒體部分序列作為參考序列。從頭拼裝首先將WGS的全部reads進(jìn)行從頭拼裝[47,48],即將核基因和線粒體基因reads都分別拼裝為長片段序列,然后依據(jù)線粒體基因組序列長度和高測序深度進(jìn)行嚴(yán)格的Contigs過濾得到候選線粒體Contigs,最后反復(fù)將WGS數(shù)據(jù)映射到候選線粒體Contigs上,不斷延長Contigs,直到延長到完整線粒體基因組長度(圖1)?,F(xiàn)有的軟件有Norgal[36]和MitoZ[39]等。對于一些沒有近緣物種線粒體基因組的物種,或者DNA降解嚴(yán)重的樣品(比如aDNA序列),用有參考序列拼裝方法就有很大的局限性。所以,對aDNA或者環(huán)境DNA首先進(jìn)行NGS測序,再進(jìn)行線粒體基因組從頭拼裝即是一個(gè)行之有效的策略。但是,這種方法常常要借助于全基因組或轉(zhuǎn)錄組拼裝的軟件和計(jì)算模塊(包括SOAPdenovo2[90]、SPAdes[37]、Velvet[91]、BIGrat[92]、CLCbio (https://www.qiagenbioi-nformatics.com/products/clc-assembly-cell)、SOAPd-enovo-Trans[93]和Trinity[94]等)對整個(gè)基因組數(shù)據(jù)進(jìn)行拼裝,而且需要反復(fù)調(diào)整Kmer值的范圍以達(dá)到最佳的拼裝質(zhì)量,所以耗費(fèi)計(jì)算資源,計(jì)算速度較慢。

傳統(tǒng)的從頭拼裝軟件,包括SOAPdenovo2、Newbler、SPAdes、Velvet、CLCbio、ALLPATHS[95]和Platanus[96]等,在全基因組序列拼裝過程中,其線粒體Scaffolds或Contigs常常被過濾掉。從頭拼裝線粒體基因組則借助傳統(tǒng)的從頭拼裝軟件,在分析過程中考慮線粒體reads的高測序深度,而不是將其刪除。目前已經(jīng)有許多動(dòng)植物的線粒體基因組用從頭的拼裝方法獲得了完整的線粒體基因組序列。Lee等[97]對桔??频慕酃?)和黨參()進(jìn)行了低覆蓋度基因組測序并對線粒體基因組進(jìn)行拼裝。他們首先使用Celera、SOAPdenovo, SPAdes和CLCbio等4種全基因組拼裝軟件對全部reads 進(jìn)行從頭拼裝,得到由核基因和線粒體組成的Contigs庫,其次根據(jù)線粒體的Contigs和核基因組的Contigs平均測序深度的差異確定候選線粒體Contigs,再將WGS數(shù)據(jù)比對到候選線粒體Contigs上,如此循環(huán)完成Contig的延長,最后得到完整的線粒體基因組[97]。類似于這種拼裝策略,Al-Nakeeb等[36]開發(fā)的Norgal軟件,先使用MEGAHIT[98]拼裝軟件對NGS數(shù)據(jù)進(jìn)行從頭拼裝,然后再將NGS數(shù)據(jù)重新映射到拼裝好的Contig上,通過線粒體和核基因組的reads覆蓋度來判斷線粒體Contig(s)。他們通過與其他不同策略的線粒體基因組拼裝軟件比較發(fā)現(xiàn),Norgal軟件的準(zhǔn)確性和NOVOPlasty軟件相似,但是從運(yùn)算速度上來比較,NOVOPlasty遠(yuǎn)比Norgal和MITObim要快,原因是Norgal需要調(diào)整不同Kmer大小對整個(gè)基因組進(jìn)行拼裝,然后再比對reads和計(jì)算核基因組reads的測序深度來判斷拼裝的可靠性[36]。

隨著用戶對數(shù)據(jù)分析的需求越來越大,要求簡化及高效率的數(shù)據(jù)分析流程、功能全面和良好的用戶體驗(yàn)的軟件越來越成為迫切的需要。Meng等[39]開發(fā)的MitoZ軟件可以“一鍵式”地對線粒體基因組進(jìn)行拼裝、注釋和可視化。該軟件包括了多種計(jì)算模塊,包括原始數(shù)據(jù)的預(yù)處理、從頭拼裝、候選線粒體序列的富集和線粒體基因組的注釋和可視化等功能。相比于其他軟件,該軟件能對低質(zhì)量的reads、堿基大量缺失的reads和建庫中PCR冗余的reads進(jìn)行過濾,以保證后續(xù)分析數(shù)據(jù)的可靠性。MitoZ整合了SOAPdenovo-Trans的算法,從核基因組中的reads進(jìn)行線粒體基因組的從頭拼裝,其原理是:根據(jù)線粒體基因組reads的平均測序深度遠(yuǎn)比核基因組的高,設(shè)置不同的Kmer參數(shù)來達(dá)到最佳的拼裝效果。這個(gè)軟件提供了兩種拼裝方式:快捷模式(quick model)和多Kmer模式。根據(jù)作者的建議盡可能使用多Kmer模式調(diào)整不同Kmer參數(shù),以保證復(fù)雜線粒體基因組拼裝的完整性和準(zhǔn)確性。從拼裝的基因數(shù)量和序列的總長度方面進(jìn)行比較,MitoZ比有參考序列的拼裝策略更具有優(yōu)勢,特別是對于物種間相似度很低的基因。此外,除了各類軟件算法的差異,重復(fù)序列、AT含量和異質(zhì)性率(異質(zhì)性位點(diǎn)占總變異位點(diǎn)的數(shù)量)等也是影響線粒體基因組的拼裝完整性和準(zhǔn)確性的關(guān)鍵因素[39]。MitoZ對線粒體基因組的注釋(Blast、Genewise、MiTFi和Infernal)以及可視化(Circos)功能集成了其他成熟的軟件模塊,因此間接地?cái)U(kuò)展了拼裝軟件的功能,也極大地簡化了數(shù)據(jù)的分析過程。

2.2 從轉(zhuǎn)錄組數(shù)據(jù)中從頭拼裝線粒體基因組策略及軟件應(yīng)用

新一代測序技術(shù)的發(fā)展同時(shí)推動(dòng)了轉(zhuǎn)錄組水平的研究,從轉(zhuǎn)錄組數(shù)據(jù)中獲得基因組編碼序列已經(jīng)很成熟,而總的RNA轉(zhuǎn)錄本中包含大量的線粒體編碼基因轉(zhuǎn)錄本,于是研究者開發(fā)了可以高效地從轉(zhuǎn)錄組數(shù)據(jù)中富集線粒體編碼基因序列的一些軟件。這些方法的原理是根據(jù)線粒體在細(xì)胞內(nèi)多拷貝數(shù)的特征,線粒體編碼基因mRNA的reads測序深度遠(yuǎn)比核基因組的編碼基因reads高,具有高水平的基因表達(dá)量。Plese等[99]開發(fā)了Trimitomics軟件能快速有效得從轉(zhuǎn)錄本reads里面對線粒體編碼基因序列進(jìn)行拼裝。該軟件的分析流程包括了NOVOPlasty、Bowtie2/Trinity和Velvet等3個(gè)獨(dú)立拼裝過程:(1)首先使用NOVOPlasty軟件將全部的RNA reads進(jìn)行從頭拼裝,根據(jù)Kmer大小范圍(25、39、45和51)確定線粒體編碼序列的完整性;(2)如果沒有拼裝到完整的線粒體編碼序列或者拼裝到部分序列,則先使用Trimmomatic 0.33[100]對原始RNA reads進(jìn)行過濾,再用Bowtie2[64]軟件將過濾后的reads 比對到近緣物種的線粒體基因組上,用Trinity[94,101]對mapped- read進(jìn)行從頭拼裝;(3)使用Velvet軟件對全部的轉(zhuǎn)錄本進(jìn)行從頭拼裝,接著用BlastN軟件[102]確定得到的線粒體Contigs。如果以上3種方法都沒有拼裝到完整的線粒體編碼序列,那么再使用Geneious軟件整合以上3種方法拼裝的結(jié)果,再將整合的結(jié)果在NCBI數(shù)據(jù)庫中進(jìn)行同源性鑒定。作者通過對6個(gè)無脊椎動(dòng)物進(jìn)行線粒體編碼基因的拼裝,結(jié)果發(fā)現(xiàn)3種拼裝過程都能夠覆蓋到97%以上的線粒體編碼基因序列。從拼裝完整性和準(zhǔn)確性來評估NOVOPlasty、Bowtie2/Trinity和Velvet拼裝過程的可靠性,結(jié)果發(fā)現(xiàn)3種拼裝方法因物種差異而差異,如和這兩種紐形動(dòng)物,Bowtie2/Trinity拼裝流程得到的線粒體編碼序列的質(zhì)量更好。而從運(yùn)行時(shí)間、運(yùn)行內(nèi)存上比較,NOVOPlasty拼裝流程更具有優(yōu)勢。值得注意的是,Trimitomics軟件提供3種拼裝流程,通過判斷拼裝結(jié)果的完整性來判斷是否進(jìn)行其他拼裝流程。同時(shí)對于復(fù)雜物種的線粒體基因組,還可以整合3種拼裝流程的結(jié)果,增加了可靠性。

3 拼裝策略及軟件使用建議

當(dāng)使用者在使用不同的線粒體基因組拼裝軟件時(shí),首先要區(qū)分選擇有參考線粒體序列拼裝方法的軟件還是從頭拼裝方法的軟件。如果使用者要拼裝的物種的遺傳信息很清楚,可以選擇有參考拼裝方法的軟件。如果要拼裝的物種缺乏相關(guān)的遺傳背景,特別是aDNA,建議選擇從頭拼裝的策略。此外,用戶選擇不同的軟件還需要注意以下幾點(diǎn):(1)了解各類軟件的原理及適用性,特別是一些軟件對基因組上高重復(fù)區(qū)有偏好性;(2)適用的物種,人或者非模式物種;(3)不同的軟件依賴于不同的數(shù)據(jù)類型,首先需要區(qū)分?jǐn)?shù)據(jù)是核基因組數(shù)據(jù)還是轉(zhuǎn)錄本數(shù)據(jù),長片段還是短片段序列,單端reads還是雙端reads等;(4)不同的軟件對輸入的文件格式有不同的要求;(5)根據(jù)使用者實(shí)際需要評估計(jì)算資源和操作系統(tǒng)選擇不同的軟件。影響線粒體基因組拼裝的完整性和準(zhǔn)確性的因素很多,包括基因組序列特征(比如重復(fù)元件,異質(zhì)性)、測序深度和測序技術(shù)(reads長度和堿基錯(cuò)誤率)都給序列拼裝帶來了挑戰(zhàn)。此外,盡管基因組拼裝算法和軟件在不斷地發(fā)展和優(yōu)化,但在WGS數(shù)據(jù)中很難區(qū)分線粒體和核基因相似的reads,以及Numts污染[103]等問題,都會(huì)造成不同拼裝軟件在拼裝結(jié)果上的沖突和后續(xù)研究分析結(jié)果的推斷[104]。值得注意的是,有研究報(bào)道發(fā)現(xiàn),不同的物種采用不同的拼裝軟件,拼裝到的線粒體基因組的完整性(比如蛋白質(zhì)編碼區(qū)、rRNA和tRNA的數(shù)量)和準(zhǔn)確性均有差異[105]。如果計(jì)算資源允許的情況下,應(yīng)當(dāng)選擇多種拼裝策略的軟件進(jìn)行線粒體基因組的拼裝,而對于低覆蓋區(qū)域或不同拼裝軟件間導(dǎo)致結(jié)果不一致的區(qū)域或gap,還需要Sanger測序進(jìn)行驗(yàn)證[105]。

本文共列舉了19個(gè)從WGS數(shù)據(jù)中拼裝線粒體基因組的軟件(表1),多數(shù)軟件的代碼和軟件包存儲(chǔ)在GitHub,優(yōu)勢在于它是基于網(wǎng)站和云的服務(wù),可以開源軟件的代碼,以及跟蹤和控制對代碼的更改。這些軟件中有12個(gè)軟件是命令行運(yùn)行的方式(CUI),即可在Linux操作系統(tǒng)下完成,用戶可以在參數(shù)設(shè)置文本文件或者命令行參數(shù)中設(shè)置軟件運(yùn)行參數(shù)。命令行運(yùn)行方式的優(yōu)點(diǎn)是可以跨平臺(tái)進(jìn)行大數(shù)據(jù)的計(jì)算,比如可以將任務(wù)提交到大型計(jì)算集群上進(jìn)行計(jì)算,缺點(diǎn)是使用者必須要熟悉大量的計(jì)算機(jī)命令,而不是用鼠標(biāo)操作就能實(shí)現(xiàn)。另外一種運(yùn)行方式是網(wǎng)絡(luò)(web server, Web)或windows圖形用戶界面運(yùn)行(GUI),用戶通過簡單的鼠標(biāo)操作就可以完成參數(shù)設(shè)置,非常適合對軟件不熟悉或者生物信息研究的初學(xué)者。

此外,本文列舉的19個(gè)軟件中,共有9個(gè)是用Python和Perl語言編寫的(表1)。其他軟件,如MIA使用的則是C/C++,而Norgal使用面向?qū)ο缶幊陶Z言Java編寫。這些編程語言具有可移植性、可擴(kuò)展性和可嵌入性、具有豐富的庫等特點(diǎn)。

4 結(jié)語與展望

新一代測序技術(shù)的不斷發(fā)展使得越來越多物種的全基因組數(shù)據(jù)信息被公開和應(yīng)用,這些數(shù)據(jù)包含線粒體DNA和核DNA。此外,即使在基因組時(shí)代,對線粒體基因組的研究仍然是不可缺少的,比如對于有復(fù)雜社會(huì)結(jié)構(gòu)和與性別相關(guān)的擴(kuò)散行為的物種的研究[13,106]等。這些研究都促進(jìn)了線粒體基因組數(shù)據(jù)爆發(fā)式增長和拼裝策略及相關(guān)軟件的發(fā)展。

線粒體基因組的拼裝是非常復(fù)雜和快速發(fā)展的領(lǐng)域,包括獲取線粒體基因組的技術(shù)和方法等都需要持續(xù)地改進(jìn)和提高,好的拼裝策略依賴于WGS數(shù)據(jù)集、計(jì)算能力和可獲得的參考基因組。此外,成功獲得一個(gè)高質(zhì)量的線粒體基因組取決于許多因素,包括建庫測序平臺(tái)、基因組的結(jié)構(gòu)特點(diǎn)(重復(fù)序列含量、GC含量等)[107]。數(shù)據(jù)類型也決定線粒體拼裝的質(zhì)量,如aDNA。最近測序技術(shù)和提取aDNA的發(fā)展推動(dòng)了古基因組的研究,并利用生物信息學(xué)的手段從WGS數(shù)據(jù)中拼裝古線粒體基因組序列。aDNA因長時(shí)間保存在土壤中或在博物館中而導(dǎo)致DNA被降解成小的DNA片段,又加上發(fā)掘的aDNA的近緣物種的不確定性,因此為古線粒體基因組的拼裝帶來許多挑戰(zhàn)。正如Meng等[39]指出,開發(fā)一款靈活性和高效率的軟件,具有良好的用戶體驗(yàn)的軟件,使得用戶能夠把更多的時(shí)間和精力集中在生物學(xué)問題研究上,而不是如何獲取線粒體基因組。

[1] Brown WM, George M Jr., Wilson AC. Rapid evolution of animal mitochondrial DNA, 1979, 76(4): 1967–1971.

[2] Lei R, Frasier CL, Hawkins MT, Engberg SE, Bailey CA, Johnson SE, Mclain AT, Groves CP, Perry GH, Nash SD, Mittermeier RA, Louis EE. Phylogenomic reconstruction of Sportive Lemurs (genus Lepilemur) recovered from mitogenomes with inferences for mada-gascar biogeography, 2017, 108(2): 107–119.

[3] Mueller RL, Macey JR, Jaekel M, Wake DB, Boore JL. Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes, 2004, 101(38): 13820–13825.

[4] Zhang P, Chen YQ, Zhou H, Liu YF, Wang XL, Papenfuss TJ, Wake DB, Qu LH. Phylogeny, evolution, and biogeography of Asiatic Salamanders (Hynobiidae), 2006, 103(19): 7360–7365.

[5] Zhang P, Papenfuss TJ, Wake MH, Qu LH, Wake DB. Phylogeny and biogeography of the family Salaman-dridae (Amphibia: Caudata) inferred from complete mitochondrial genomes, 2008, 49(2): 586–597.

[6] Cerny V, Fernandes V, Costa MD, Hájek M, Mulligan CJ, Pereira L. Migration of Chadic speaking pastoralists within Africa based on population structure of Chad Basin and phylogeography of mitochondrial L3f haplo-group, 2009, 9: 63.

[7] Klimova A, Phillips CD, Fietz K, Olsen MT, Harwood J, Amos W, Hoffman JI. Global population structure and demographic history of the grey seal, 2014, 23(16): 3999–4017.

[8] Lin LH, Ji X, Diong CH, Du Y, Lin CX. Phylogeog-raphy and population structure of the Reevese's Butte-rfly Lizard (Leiolepis reevesii) inferred from mitochon-drial DNA sequences, 2010, 56(2): 601–607.

[9] Miller W, Hayes VM, Ratan A, Petersen DC, Wittekindt NE, Miller J, Walenz B, Knight J, Qi J, Zhao F, Wang Q, Bedoya-Reina OC, Katiyar N, Tomsho LP, Kasson LM, Hardie RA, Woodbridge P, Tindall EA, Bertelsen MF, Dixon D, Pyecroft S, Helgen KM, Lesk AM, Pringle TH, Patterson N, Zhang Y, Kreiss A, Woods GM, Jones ME, Schuster SC. Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasm-anian devil), 2011, 108(30): 12348–12353.

[10] Roslin T. Spatial population structure in a patchily distributed beetle, 2001, 10(4): 823–837.

[11] Teacher AG, André C, Meril? J, Wheat CW. Whole mitochondrial genome scan for population structure and selection in the Atlantic herring, 2012, 12: 248.

[12] Uren C, Kim M, Martin AR, Bobo D, Gignoux CR, Van Helden PD, M?ller M, Hoal EG, Henn BM. Fine-scale human population structure in southern Africa reflects ecogeographic boundaries, 2016, 204(1): 303– 314.

[13] Kuang WM, Ming C, Li HP, Wu H, Frantz L, Roos C, Zhang YP, Zhang CL, Jia T, Yang JY, Yu L. The origin and population history of the endangered golden snub- nosed monkey (Rhinopithecus roxellana), 2019, 36(3): 487–499.

[14] Haberman Y, Karns R, Dexheimer PJ, Schirmer M, Somekh J, Jurickova I, Braun T, Novak E, Bauman L, Collins MH, Mo A, Rosen MJ, Bonkowski E, Gotman N, Marquis A, Nistel M, Rufo PA, Baker SS, Sauer CG, Markowitz J, Pfefferkorn MD, Rosh JR, Boyle BM, Mack DR, Baldassano RN, Shah S, Leleiko NS, Heyman MB, Grifiths AM, Patel AS, Noe JD, Aronow BJ, Kugathasan S, Walters TD, Gibson G, Thomas SD, Mollen K, Shen-Orr S, Huttenhower C, Xavier RJ, Hyams JS, Denson LA. Ulcerative colitis mucosal transcriptomes reveal mitochondriopathy and persona-lized mechanisms underlying disease severity and trea-tment response, 2019, 10(1): 38.

[15] Inak G, Lorenz C, Lisowski P, Zink A, Mlody B, Prigione A. Concise review: induced pluripotent stem cell-based drug discovery for mitochondrial disease, 2017, 35(7): 1655–1662.

[16] Suomalainen A. Mitochondrial DNA and disease, 1997, 29(3): 235–246.

[17] Toda T. Molecular genetics of Parkinson's disease, 2007, 59(8): 815–823.

[18] Armbrust EV, Berges JA, Bowler C, Green BR, Martinez D, Putnam NH, Zhou S, Allen AE, Apt KE, Bechner M, Brzezinski MA, Chaal BK, Chiovitti A, Davis AK, Demarest MS, Detter JC, Glavina T, Goodstein D, Hadi MZ, Hellsten U, Hildebrand M, Jenkins BD, Jurka J, Kapitonov VV, Kr?ger N, Lau WW, Lane TW, Larimer FW, Lippmeier JC, Lucas S, Medina M, Montsant A, Obornik M, Parker MS, Palenik B, Pazour GJ, Richardson PM, Rynearson TA, Saito MA, Schwartz DC, Thamatrakoln K, Valentin K, Vardi A, Wilkerson FP, Rokhsar DS. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism, 2004, 306(5693): 79–86.

[19] Janzen DH, Burns JM, Cong Q, Hallwachs W, Dapkey T, Manjunath R, Hajibabaei M, Hebert PDN, Grishin NV. Nuclear genomes distinguish cryptic species suggested by their DNA barcodes and ecology, 2017, 114(31): 8313–8318.

[20] Zarowiecki MZ, Huyse T, Littlewood DT. Making the most of mitochondrial genomes--markers for phylogeny, molecular ecology and barcodes in Schistosoma (Platy-helminthes: Digenea), 2007, 37(12): 1401–1418.

[21] Hu M, Jex AR, Campbell BE, Gasser RB. Long PCR amplification of the entire mitochondrial genome from individual helminths for direct sequencing, 2007, 2(10): 2339–2344.

[22] Nabholz B, Jarvis ED, Ellegren H. Obtaining mtDNA genomes from next-generation transcriptome sequencing: a case study on the basal Passerida (Aves: Passeriformes) phylogeny, 2010, 57(1): 466–470.

[23] Timmermans MJ, Dodsworth S, Culverwell CL, Bocak L, Ahrens D, Littlewood DT, Pons J, Vogler AP. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics, 2010, 38(21): e197.

[24] Metzker ML. Sequencing technologies - the next generation, 2010, 11(1): 31–46.

[25] Lounsberry ZT, Brown SK, Collins PW, Henry RW, Newsome SD, Sacks BN. Next-generation sequencing workflow for assembly of nonmodel mitogenomes exemplified with North Pacific albatrosses (Phoebastria spp.), 2015, 15(4): 893–902.

[26] Shearman JR, Sonthirod C, Naktang C, Pootakham W, Yoocha T, Sangsrakru D, Jomchai N, Tragoonrung S, Tangphatsornruang S. The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads, 2016, 6: 31533.

[27] Kovar L, Nageswara-Rao M, Ortega-Rodriguez S, Dugas DV, Straub S, Cronn R, Strickler SR, Hughes CE, Hanley KA, Rodriguez DN, Langhorst BW, Dimalanta ET, Bailey CD. PacBio-based mitochondrial genome assembly of Leucaena trichandra (Leguminosae) and an intrageneric assessment of mitochondrial RNA editing, 2018, 10(9): 2501–2517.

[28] Wang SB, Song QW, Li SS, Hu ZG, Dong GQ, Song C, Huang HW, Liu YF. Assembly of a complete mitoge-nome of chrysanthemum nankingense using Oxford Nanopore long reads and the diversity and evolution of Asteraceae mitogenomes, 2018, 9(11): 547.

[29] Gan HM, Linton SM, Austin CM. Two reads to rule them all: Nanopore long read-guided assembly of the iconic Christmas Island red crab, Gecarcoidea natalis (Pocock, 1888), mitochondrial genome and the challen-ges of AT-rich mitogenomes, 2019, 45: 64–71.

[30] Maughan PJ, Chaney L, Lightfoot DJ, Cox BJ, Tester M, Jellen EN, Jarvis DE. Mitochondrial and chloroplast genomes provide insights into the evolutionary origins of quinoa (Chenopodium quinoa Willd.), 2019, 9(1): 185.

[31] Mofiz E, Seemann T, Bahlo M, Holt D, Currie BJ, Fischer K, Papenfuss AT. Mitochondrial genome sequence of the Scabies Mite provides insight into the genetic diversity of individual scabies infections, 2016, 10(2): e0004384.

[32] Ni P, Bhuiyan AA, Chen JH, Li J, Zhang C, Zhao S, Du X, Li H, Yu H, Liu X, Li K. De novo assembly of mitochondrial genomes provides insights into genetic diversity and molecular evolution in wild boars and domestic pigs, 2018, 146(3): 277–285.

[33] Niu WT, Yu SG, Tian P, Xiao JG. Complete mito-chondrial genome of Echinophyllia aspera (Scleractinia, Lobophylliidae): mitogenome characterization and phy-logenetic positioning, 2018, 793: 1–14.

[34] Sahoo PK, Singh L, Sharma L, Kumar R, Singh VK, Ali S, Singh AK, Barat A. The complete mitogenome of brown trout (Salmo trutta fario) and its phylogeny, 2016, 27(6): 4563–4565.

[35] Shi YC, Liu Y, Zhang SZ, Zou R, Tang JM, Mu WX, Peng Y, Dong SS. Assembly and comparative analysis of the complete mitochondrial genome sequence of Sophora japonica 'JinhuaiJ2', 2018, 13(8): e0202485.

[36] Al-Nakeeb K, Petersen TN, Sicheritz-Pontén T. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data, 2017, 18(1): 510.

[37] Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing, 2012, 19(5): 455–477.

[38] Dierckxsens N, Mardulyn P, Smits G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data, 2017, 45(4): e18.

[39] Meng GL, Li YY, Yang CT, Liu SL. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization, 2019, 47(11): 63.

[40] Bignell GR, Miller AR, Evans IH. Isolation of mitocho-ndrial DNA, 1996, 53: 109–116.

[41] Li G, Davis BW, Eizirik E, Murphy WJ. Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae), 2016, 26(1): 1–11.

[42] Yang QQ, Li ZH, Liu LJ. Advance and application of mtDNA COⅠ barcodes on insects., 2012, 49(06): 1687–1695.楊倩倩, 李志紅, 伍祎, 柳麗君. 線粒體COⅠ基因在昆蟲DNA條形碼中的研究與應(yīng)用應(yīng)用昆蟲學(xué)報(bào), 2012, 49(06): 1687–1695.

[43] Sha M, Lin LL, Li XJ, Huang Y. Strategy and methods for sequencing mitochondrial genome., 2013, 50(01): 293–297.沙淼, 林立亮, 李雪娟, 黃原. 線粒體基因組測序策略和方法應(yīng)用昆蟲學(xué)報(bào), 2013, 50(01): 293–297.

[44] Li TJ, Cao YX, Zhao HC, Yu Y, Qiao J. Research progress of sequencing method for animal mitochon-drial genome., 2016, 44(06): 796–800.李天杰, 曹延祥, 趙紅翠, 于洋, 喬杰. 動(dòng)物線粒體基因組測序方法的研究進(jìn)展天津醫(yī)藥, 2016, 44(06): 796–800.

[45] Groenenberg DSJ, Harl J, Duijm E, Gittenberger E. The complete mitogenome of Orcula dolium (Draparnaud, 1801); ultra-deep sequencing from a single long-range PCR using the Ion-Torrent PGM, 2017, 154: 7.

[46] King JL, Larue BL, Novroski NM, Stoljarova M, Seo SB, Zeng X, Warshauer DH, Davis CP, Parson W, Sajantila A, Budowle B. High-quality and high-throug-hput massively parallel sequencing of the human mito-chondrial genome using the Illumina MiSeq, 2014, 12: 128–135.

[47] Hunter SS, Lyon RT, Sarver BAJ, Hardwick K, Forney LJ, Settles ML. Assembly by Reduced Complexity (ARC): a hybrid approach for targeted assembly of homologous sequences, 2015: 014662.

[48] Machado DJ, Lyra ML, Grant T. Mitogenome assembly from genomic multiplex libraries: comparison of strate-gies and novel mitogenomes for five species of frogs, 2016, 16(3): 686–693.

[49] Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform, 2009, 25(14): 1754–1760.

[50] Min-Shan Ko A, Zhang YQ, Yang MA, Hu YB, Cao P, Feng XT, Zhang LZ, Wei FW, Fu QM. Mitochondrial genome of a 22,000-year-old giant panda from southern China reveals a new panda lineage, 2018, 28(12): R693–R694.

[51] Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease, 2005, 6(5): 389–402.

[52] Torroni A, Achilli A, Macaulay V, Richards M, Bandelt HJ. Harvesting the fruit of the human mtDNA tree, 2006, 22(6): 339–345.

[53] Green RE, Malaspinas AS, Krause J, Briggs AW, Johnson PLF, Uhler C, Meyer M, Good JM, Maricic T, Stenzel U, Prüfer K, Siebauer M, Burbano HA, Ronan M, Rothberg JM, Egholm M, Rudan P, Brajkovi? D, Ku?an Z, Gu?i? I, Wikstr?m M, Laakkonen L, Kelso J, Slatkin M, P??bo S. A complete Neandertal mitochon-drial genome sequence determined by high-throughput sequencing, 2008, 134(3): 416–426.

[54] Zhidkov I, Nagar T, Mishmar D, Rubin E. MitoBam-Annotator: A web-based tool for detecting and annota-ting heteroplasmy in human mitochondrial DNA seque-nces, 2011, 11(6): 924–928.

[55] Guo Y, Li J, Li CI, Shyr Y, Samuels DC. MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis, 2013, 29(9): 1210–1211.

[56] Yang IS, Lee HY, Yang WI, Shin KJ. mtDNAprofiler: a Web application for the nomenclature and comparison of human mitochondrial DNA sequences, 2013, 58(4): 972–980.

[57] Vellarikkal SK, Dhiman H, Joshi K, Hasija Y, Sivasubbu S, Scaria V. mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial varia-tions from next-generation sequencing datasets, 2015, 36(4): 419–424.

[58] Calabrese C, Simone D, Diroma MA, Santorsola M, Guttà C, Gasparre G, Picardi E, Pesole G, Attimonelli M. MToolBox: a highly automated pipeline for heterop-lasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing, 2014, 30(21): 3115–3117.

[59] Navarro-Gomez D, Leipzig J, Shen L, Lott M, Stassen AP, Wallace DC, Wiggs JL, Falk MJ, Van Oven M, Gai X. Phy-Mer: a novel alignment-free and reference-inde-pendent mitochondrial haplogroup classifier, 2015, 31(8): 1310–1312.

[60] Weissensteiner H, Forer L, Fuchsberger C, Sch?pf B, Kloss-Brandst?tter A, Specht G, Kronenberg F, Sch?-nherr S. mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud, 2016, 44(W1): W64–69.

[61] Ishiya K, Ueda S. MitoSuite: a graphical tool for human mitochondrial genome profiling in massive parallel sequencing, 2017, 5: e3406.

[62] Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an informa-tion aesthetic for comparative genomics, 2009, 19(9): 1639–1645.

[63] Mckenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, Depristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next- generation DNA sequencing data, 2010, 20(9): 1297–1303.

[64] Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2, 2012, 9(4): 357–359.

[65] Mardis ER. The impact of next-generation sequencing technology on genetics, 2008, 24(3): 133–141.

[66] Hahn C, Bachmann L, Chevreux B. Reconstructing mitochondrial genomes directly from genomic next- generation sequencing reads--a baiting and iterative mapping approach, 2013, 41(13): e129.

[67] Hahn C. Assembly of ancient mitochondrial genomes without a closely related reference sequence, 2019, 1963: 195–213.

[68] Li R, Ren X, Bi Y, Ding Q, Ho VWS, Zhao Z. Comparative mitochondrial genomics reveals a possible role of a recent duplication of NADH dehydrogenase subunit 5 in gene regulation, 2018, 25(6): 577–586.

[69] Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE, 2007, 23(4): 500–501.

[70] Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V, Mardis ER, Dangl JL, Jones CD. Extending assembly of short DNA sequences to handle error, 2007, 23(21): 2942–2944.

[71] Bakker FT, Lei D, Yu JY, Mohammadin S, Wei Z, Van De Kerke S, Gravendeel B, Nieuwenhuis M, Staats M, Alquezar-Planas DE, Holmer R. Herbarium genomics: plastome sequence assembly from a range of herbarium specimens using an iterative organelle genome assembly pipeline, 2016, 117(1): 33–43.

[72] Jin JJ, Yu WB, Yang JB, Song Y, dePamphilis CW, Yi TS, Li DZ. GetOrganelle: a simple and fast pipeline for de novo assembly of a complete circular chloroplast genome using genome skimming data, 2018: 256479.

[73] Coissac E, Hollingsworth PM, Lavergne S, Taberlet P. From barcodes to genomes: extending the concept of DNA barcoding, 2016, 25(7): 1423–1428.

[74] Clark SC, Egan R, Frazier PI, Wang Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies, 2013, 29(4): 435–443.

[75] Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications, 2009, 10: 421.

[76] Bayliss SC, Hunt VL, Yokoyama M, Thorpe HA, Feil EJ. The use of Oxford Nanopore native barcoding for complete genome assembly, 2017, 6(3): 1–6.

[77] Cao MD, Nguyen SH, Ganesamoorthy D, Elliott AG, Cooper MA, Coin LJ. Scaffolding and completing genome assemblies in real-time with nanopore seque-ncing, 2017, 8: 14515.

[78] Deschamps S, Zhang Y, Llaca V, Ye L, Sanyal A, King M, May G, Lin HN. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping, 2018, 9(1): 4844.

[79] Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz MC, Mccombie WR. Oxford Nanopore sequen-cing, hybrid error correction, and de novo assembly of a eukaryotic genome, 2015, 25(11): 1750– 1756.

[80] Lin MM, Qi XJ, Chen JY, Sun LM, Zhong YP, Fang JB, Hu CG. The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform, 2018, 13(5): e0197393.

[81] Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data, 2013, 10(6): 563–569.

[82] Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation, 2017, 27(5): 722–736.

[83] Miyamoto M, Motooka D, Gotoh K, Imai T, Yoshitake K, Goto N, Iida T, Yasunaga T, Horii T, Arakawa K, Kasahara M, Nakamura S. Performance comparison of second- and third-generation sequencers using a bacte-rial genome with two chromosomes, 2014, 15(1): 699.

[84] Soorni A, Haak D, Zaitlin D, Bombarely A. Organe-lle_PBA, a pipeline for assembling chloroplast and mitochondrial genomes from PacBio DNA sequencing data, 2017, 18(1): 49.

[85] Chaisson MJ, Tesler G. Mapping single molecule seque-ncing reads using basic local alignment with successive refinement (BLASR): application and theory, 2012, 13: 238.

[86] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The sequence alignment/map format and SAMtools, 2009, 25(16): 2078–2079.

[87] Mcginnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools, 2004, 32(Web Server issue): W20– 25.

[88] Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information, 2014, 15: 211.

[89] Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features, 2010, 26(6): 841–842.

[90] Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J. SOAPdenovo2: an empiricallyimproved memory-efficient short-read de novo assembler, 2012, 1(1): 18.

[91] Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs, 2008, 18(5): 821–829.

[92] Zhang TW, Luo YF, Chen YP, Li XN, Yu J. BIGrat: a repeat resolver for pyrosequencing-based re-sequencing with Newbler, 2012, 5: 567.

[93] Xie Y, Wu G, Tang J, Luo R, Patterson J, Liu S, Huang W, He G, Gu S, Li S, Zhou X, Lam TW, Li Y, Xu X, Wong GK, Wang J. SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads, 2014, 30(12): 1660–1666.

[94] Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis, 2013, 8(8): 1494–1512.

[95] Maccallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, Mckernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads, 2009, 10(10): R103.

[96] Kajitani R, Yoshimura D, Okuno M, Minakuchi Y, Kagoshima H, Fujiyama A, Kubokawa K, Kohara Y, Toyoda A, Itoh T. Platanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions, 2019, 10(1): 1702.

[97] Lee HO, Choi JW, Baek JH, Oh JH, Lee SC, Kim CK. Assembly of the mitochondrial genome in the campa-nulaceae family using Illumina low-coverage sequen-cing, 2018, 9(8): 383.

[98] Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph, 2015, 31(10): 1674–1676.

[99] Plese B, Rossi ME, Kenny NJ, Taboada S, Koutsouveli V, Riesgo A. Trimitomics: an efficient pipeline for mitochondrial assembly from transcriptomic reads in non-model species, 2018, 19(5): 1230–1239.

[100] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data, 2014, 30(15): 2114–2120.

[101] Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychow-dhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, Di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome, 2011, 29(7): 644– 652.

[102] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool, 1990, 215(3): 403–410.

[103] Li M, Schroeder R, Ko A, Stoneking M. Fidelity of capture-enrichment for mtDNA genome sequencing: influence of NUMTs, 2012, 40(18): e137.

[104] Li Y, Li X, Chen Y. Research summary of mitochondria pseudogene., 2012, 31(05): 68–75.李艷, 黎霞, 陳艷. 線粒體假基因研究綜述綿陽師范學(xué)院學(xué)報(bào), 2012, 31(05): 68–75.

[105] Velozo Timbó R, Coiti Togawa R, Costa MMC, Andow DA, Paula DP. Mitogenome sequence accuracy using different elucidation methods, 2017, 12(6): e0179971.

[106] Peters JL, Bolender KA, Pearce JM. Behavioural vs. molecular sources of conflict between nuclear and mitochondrial DNA: the role of male-biased dispersal in a Holarctic sea duck, 2012, 21(14): 3562– 3575.

[107] Ekblom R, Smeds L, Ellegren H. Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria, 2014, 15: 467.

Mitogenome assembly strategies and software applications in the genome era

Weimin Kuang, Li Yu

With rapid advances in next-generation sequencing technologies, the genomes of many organisms have been sequenced and widely applied in different settings. Mitochondrial genome data is equally important and the high-throughput whole-genome data typically contain mitochondrial genome (mitogenome) sequences. How to extract and assemble the mitogenome from massive whole-genome sequencing (WGS) data remain a hot area in molecular biology, genetics and medicine. The cataloging and analysis of accumulating mitogenome data promotes the development of assembly strategies and corresponding software applications related to mitochondrial DNA from the WGS data. Mitogenome assembly strategies can be divided into mitogenome-reference strategy andstrategy. Each strategy has different advantages and limitations with respect to the difference of bait mitogenome-linked short reads from the WGS data and corresponding assembly strategy. In this review, we summarize and compare current mitogenome assembly strategies and the software applications available. We also provide suggestions related to use different assembly strategies and software applications, and the expected benefits and limitations of methods references in life science.

whole-genome sequencing; mitogenome; mitogenome-reference assembly;assembly; assembly software

2019-08-07;

2019-09-25

國家自然科學(xué)基金項(xiàng)目(編號:31872213),云南省教育廳科學(xué)研究基金產(chǎn)業(yè)化培育項(xiàng)目(編號:2016CYH02)和s目[Supported by the National Natural Science Foundation of China (No.31872213), Industrialization Cultivation Project of Scientific Research Fund of Yunnan Education Department (No. 2016CYH02) and the Academic Graduate Students Foundation of Yunnan Province]

匡衛(wèi)民,博士,專業(yè)方向:遺傳學(xué)。E-mail: kuangwm0714@sina.com

于黎,博士,研究員,研究方向:動(dòng)物遺傳與進(jìn)化。E-mail: yuli@ynu.edu.cn

10.16288/j.yczz.19-227

2019/10/29 16:37:23

URI: http://kns.cnki.net/kcms/detail/11.1913.R.20191029.1041.001.html

(責(zé)任編委: 吳東東)

猜你喜歡
線粒體基因組測序
杰 Sir 帶你認(rèn)識(shí)宏基因二代測序(mNGS)
新民周刊(2022年27期)2022-08-01 07:04:49
牛參考基因組中發(fā)現(xiàn)被忽視基因
棘皮動(dòng)物線粒體基因組研究進(jìn)展
線粒體自噬與帕金森病的研究進(jìn)展
二代測序協(xié)助診斷AIDS合并馬爾尼菲籃狀菌腦膜炎1例
傳染病信息(2021年6期)2021-02-12 01:52:58
基因捕獲測序診斷血癌
單細(xì)胞測序技術(shù)研究進(jìn)展
NF-κB介導(dǎo)線粒體依賴的神經(jīng)細(xì)胞凋亡途徑
基因組DNA甲基化及組蛋白甲基化
遺傳(2014年3期)2014-02-28 20:58:49
有趣的植物基因組
泸州市| 古交市| 湟源县| 郴州市| 东乡族自治县| 濉溪县| 高唐县| 鄱阳县| 漾濞| 庆云县| 家居| 宜阳县| 双峰县| 塘沽区| 绥阳县| 东平县| 佛山市| 青铜峡市| 盖州市| 南乐县| 延吉市| 郎溪县| 祁门县| 叶城县| 巫山县| 蒙城县| 行唐县| 浠水县| 巴彦淖尔市| 玉林市| 自贡市| 敦化市| 巴南区| 高碑店市| 田东县| 西畴县| 扎鲁特旗| 永新县| 彝良县| 喀喇沁旗| 阳原县|