曹林,韓麗,楊明,韓彬,劉福
·論著·
利用生物信息學(xué)分析篩選乳腺癌不良預(yù)后的關(guān)鍵基因
曹林,韓麗,楊明,韓彬,劉福
637000 南充,川北醫(yī)學(xué)院附屬醫(yī)院藥劑科
利用生物信息學(xué)方法,通過(guò)分析基因表達(dá)數(shù)據(jù)庫(kù)(GEO)基因芯片數(shù)據(jù)篩選與乳腺癌不良預(yù)后相關(guān)的核心基因,為乳腺癌的治療提供新的候選靶點(diǎn)。
從 GEO 數(shù)據(jù)庫(kù)下載微陣列數(shù)據(jù)集 GSE15852,采用 GEO 在線工具 GEO2R 篩選差異表達(dá)基因(DEGs);DAVID 數(shù)據(jù)庫(kù)對(duì)篩選出的差異表達(dá)基因,進(jìn)行基因本體論分析(GO)和京都基因與基因組百科全書(shū)(KEGG)通路富集分析;基于 STRING 數(shù)據(jù)庫(kù)和 Cytoscape 軟件構(gòu)建蛋白質(zhì)-蛋白質(zhì)相互作用關(guān)系(PPI),并用Cytoscape 軟件 MCODE 插件進(jìn)行模塊分析,獲取關(guān)鍵基因;用在線工具 Kaplan-Meier Plotter 對(duì)這些關(guān)鍵基因進(jìn)行生存分析,獲取與乳腺癌預(yù)后不良的相關(guān)核心基因;采用基因表達(dá)譜交互分析(GEPIA)進(jìn)一步驗(yàn)證。
篩選出 57 個(gè)差異表達(dá)基因,其中上調(diào)基因 17 個(gè),下調(diào)基因 40 個(gè)。上調(diào)基因主要富集在雌激素反應(yīng)、對(duì)細(xì)胞運(yùn)動(dòng)的負(fù)調(diào)控反應(yīng)、心臟右心室形態(tài)發(fā)生、交感神經(jīng)系統(tǒng)發(fā)育、細(xì)胞-細(xì)胞黏附及輸尿管的萌芽發(fā)育等生物過(guò)程;聚焦于造血細(xì)胞系信號(hào)通路。下調(diào)基因顯著富集在脂質(zhì)代謝、分解、存儲(chǔ)過(guò)程,膽固醇的儲(chǔ)存、運(yùn)輸,甘油三酯的合成分解代謝,血管生成等生物過(guò)程;聚焦于 PPAR 信號(hào)通路、對(duì)脂肪細(xì)胞脂肪分解的調(diào)節(jié)作用、脂肪細(xì)胞因子信號(hào)通路等途徑。PPI 網(wǎng)絡(luò)及 MCODE 模塊分析鑒定出 7 個(gè)核心基因,關(guān)鍵基因的生存分析及 GEPIA 分析發(fā)現(xiàn)24 和基因的高表達(dá)患者生存率低于低表達(dá)患者。
該方法為尋找乳腺癌不良預(yù)后的關(guān)鍵基因、探索乳腺癌治療新靶點(diǎn)提供一定依據(jù)。
乳腺癌; 生物信息學(xué); GEO 數(shù)據(jù)庫(kù); 差異表達(dá)基因; 關(guān)鍵基因
乳腺癌是女性發(fā)病率最高的惡性腫瘤之一,是全球女性癌癥死亡的主要原因。在全球范圍內(nèi),2018 年有大約 210 萬(wàn)新診斷的女性乳腺癌病例,占女性癌癥病例的近四分之一[1]。隨著篩查方法及治療手段的發(fā)展,乳腺癌患者的生存率有所提高。但有研究報(bào)道,I 期確診的乳腺癌患者 5 年相對(duì)生存率接近 100%,而對(duì)于那些被診斷為 IV 期乳腺癌的患者 5 年相對(duì)生存率下降到 26%,揭示乳腺癌的晚期患者生存率較低[2]。手術(shù)、化療和放療等傳統(tǒng)治療方法對(duì)晚期乳腺癌患者來(lái)說(shuō)并不能提供理想的治療結(jié)果[3]。此外,乳腺癌腫瘤的異質(zhì)性,使乳腺癌的治療結(jié)果個(gè)體差異大。乳腺癌發(fā)病機(jī)制復(fù)雜,目前對(duì)其潛在的分子機(jī)制尚不完全清楚。因此,迫切需要探索更特異、更經(jīng)濟(jì)的生物標(biāo)志物來(lái)預(yù)測(cè)乳腺癌的預(yù)后,開(kāi)發(fā)更好的治療策略和更好地了解其潛在機(jī)制的靶點(diǎn)。
近年來(lái),基于高通量平臺(tái)的微陣列已成為篩選癌癥發(fā)生過(guò)程中重要的遺傳或表觀遺傳學(xué)改變及尋找癌癥診斷和預(yù)后的有前途的生物標(biāo)志物的有效工具[4]。生物信息學(xué)分析基于基因芯片,通過(guò)數(shù)據(jù)篩選、統(tǒng)計(jì)分析、可視化手段、分子互作網(wǎng)絡(luò)和通路分析等方法整合海量、復(fù)雜的生物學(xué)信息,挖掘潛在的生物標(biāo)記物,為疾病的治療提供新的策略。
鑒于乳腺癌在女性的高發(fā)病率和死亡率,不少學(xué)者進(jìn)行乳腺癌研究。早期診斷和分子靶向治療迫切需要確定決定乳腺癌進(jìn)展、轉(zhuǎn)移和不良預(yù)后的關(guān)鍵基因。Li 等[5]研究表明,LAPTM4B、VEGF 和核 survivin 的表達(dá)與乳腺癌患者的各種臨床病理特征和預(yù)后顯著相關(guān),可以被視為乳腺癌的治療靶點(diǎn)。文獻(xiàn)研究報(bào)道一些與乳腺癌不良預(yù)后分子生物標(biāo)記物,如 KPNA2 有助于關(guān)鍵蛋白的異常定位和乳腺癌的預(yù)后不良[6],RASSF1A 甲基化對(duì)女性乳腺癌預(yù)后不良的預(yù)測(cè)作用[7]等。本研究通過(guò)分析基因表達(dá)數(shù)據(jù)庫(kù)(gene expression omnibus,GEO)中基因芯片,旨在探討乳腺癌不良預(yù)后的關(guān)鍵差異基因,希望能獲得更多與乳腺癌預(yù)后相關(guān)的分子機(jī)制的生物學(xué)信息,為治療乳腺癌提供新的靶點(diǎn)。
美國(guó)國(guó)立生物技術(shù)中心(NCBI)的 GEO 數(shù)據(jù)庫(kù)是一個(gè)免費(fèi)的基因表達(dá)公共數(shù)據(jù)庫(kù)(https:// www.ncbi.nlm.nih.gov/geoprofiles/),從 GEO 下載乳腺癌的微陣列數(shù)據(jù)集 GSE15852。芯片信息:[HG-U133A] Affymetrix Human Genome U133A Array,ID:20097481,平臺(tái):GPL96。該數(shù)據(jù)集包含 43 例乳腺癌組織和 43 例正常乳腺組織基因表達(dá)數(shù)據(jù)。
1.2.1 差異表達(dá)基因分析 用GEO2R在線工具篩選乳腺癌標(biāo)本與正常乳腺標(biāo)本之間的差異表達(dá)基因(differentially expressed genes,DEGs),篩選條件|logFC| > 2(FC 為差異倍數(shù)),矯正后值 < 0.01。logFC < 0 的 DEGs 為下調(diào)基因,log FC > 0 的 DEGs 為上調(diào)基因。
1.2.2 基因本體論和京都基因與基因組百科全書(shū)通路富集分析 DAVID 是一個(gè)在線生物信息學(xué)工具,用于基因/蛋白質(zhì)功能注釋和功能基因集富集。將篩選的 DEGs 輸入 DAVID6.8(https://david. ncifcrf.gov)進(jìn)行基因本體論(gene ontology analysis,GO)功能注釋,包括生物過(guò)程(biological processes,BP)、分子功能(molecular function,MF)、細(xì)胞成分(cell component,CC);京都基因與基因組百科全書(shū)(Kyoto encyclopedia of genes and genomes,KEGG)通路富集分析,尋找差異表達(dá)基因富集的關(guān)鍵信號(hào)通路。以< 0.05 認(rèn)為差異具有統(tǒng)計(jì)學(xué)意義。
1.2.3 蛋白質(zhì)-蛋白質(zhì)相互作用關(guān)系網(wǎng)絡(luò)和模塊分析 采用 STRING11.0 數(shù)據(jù)庫(kù)(https://string-db. org/cgi/input.pl),設(shè)置置信度為 0.04,構(gòu)建乳腺癌差異表達(dá)基因蛋白質(zhì)-蛋白質(zhì)相互作用關(guān)系網(wǎng)絡(luò)(protein-protein interaction networks,PPI),用Cytoscape 3.7.2 軟件進(jìn)行對(duì) PPI 網(wǎng)絡(luò)可視化,并用 Cytoscape 軟件的 MCODE 插件進(jìn)行模塊分析,篩選關(guān)鍵基因。
1.2.4 生存分析 通過(guò)在線數(shù)據(jù)庫(kù) Kaplan-Meier Plotter(https://kmplot.com/analysis/)中乳腺癌樣本的生存率,對(duì)“1.2.3”中 MCODE 篩選出的關(guān)鍵基因進(jìn)行總生存(overall survival,OS)分析。logrank< 0.05 被認(rèn)為具有統(tǒng)計(jì)學(xué)意義。篩選出乳腺癌患者生存率較差的基因,采用基因表達(dá)譜交互分析(gene expression profiling interactive analysis,GEPIA)進(jìn)一步驗(yàn)證,獲取與乳腺癌不良預(yù)后的關(guān)鍵基因。
GEO 在線工具GEO2R 基因芯片 GSE15852 進(jìn)行分析,以 |logFC| > 2 和矯正后< 0.01 為標(biāo)準(zhǔn),篩選出 57 個(gè)差異表達(dá)基因,其中上調(diào)基因17 個(gè),下調(diào)基因 40 個(gè)(表 1)。
通過(guò) DAVID 網(wǎng)站對(duì) 57 個(gè)差異表達(dá)基因進(jìn)行 GO 功能富集分析和 KEGG 通路富集分析,GO 分析包括生物過(guò)程(BP)、細(xì)胞成分(CC)和分子功能(MF)。結(jié)果表明,上調(diào)基因顯著富集在雌激素反應(yīng)、對(duì)細(xì)胞運(yùn)動(dòng)的負(fù)調(diào)控反應(yīng)、心臟右心室形態(tài)發(fā)生、交感神經(jīng)系統(tǒng)發(fā)育、細(xì)胞-細(xì)胞黏附及輸尿管的萌芽發(fā)育等生物過(guò)程;主要聚焦于造血細(xì)胞系信號(hào)通路。而下調(diào)基因顯著富集在脂質(zhì)代謝、分解、存儲(chǔ)過(guò)程,膽固醇的儲(chǔ)存、轉(zhuǎn)運(yùn),甘油三酯的合成分解代謝,血管生成等生物過(guò)程;聚焦于 PPAR 信號(hào)通路、對(duì)脂肪細(xì)胞脂肪分解的調(diào)節(jié)作用、脂肪細(xì)胞因子信號(hào)通路等途徑(圖 1 和表 2)。
表 1 差異表達(dá)基因(17 個(gè)上調(diào)基因和40 個(gè)下調(diào)基因)(P < 0.01)
Figure 1 Genes ontology enrichment analysis of differentially expressed genes (A: Up-regulated genes; B: Down-regulated genes;< 0.05)
表 2 差異表達(dá)基因 KEGG 通路富集分析(P < 0.05)
圖 2 差異表達(dá)基因的PPI 網(wǎng)絡(luò)和模塊分析(A:差異表達(dá)基因的PPI 網(wǎng)絡(luò)可視化結(jié)果;B:關(guān)鍵模塊;紅色為上調(diào)基因,藍(lán)色為下調(diào)基因,連接線表示差異表達(dá)基因之間的相互作用)
Figure 2 DEGs PPI network complex and the module analysis (A:Differentially expressed gene protein interaction network visualization results; B: Key module; Up-regulated genes were marked in red, down-regulated genes were marked in blue, and the lines show the interaction between the DEGs)
STRING11.0 在線數(shù)據(jù)庫(kù)構(gòu)建 PPI 網(wǎng)絡(luò)。57 個(gè)差異表達(dá)基因中共有 37 個(gè)差異表達(dá)基因(14 個(gè)上調(diào)基因和 23 個(gè)下調(diào)基因)被過(guò)濾到 PPI 網(wǎng)絡(luò)復(fù)合體中,該復(fù)合體包括 37 個(gè)節(jié)點(diǎn)和 86 條邊,而 57 個(gè) DEG 中有 20 個(gè)沒(méi)有過(guò)濾到 PPI 網(wǎng)絡(luò)復(fù)合體中(圖 2A)。然后,通過(guò) Cytotype MCODE 插件識(shí)別關(guān)鍵模塊。以 Node Score Cutoff = 0.2,K-Core = 2,Max. Depth = 100 為標(biāo)準(zhǔn),在 37 個(gè)節(jié)點(diǎn)中共鑒定出 7 個(gè)中心節(jié)點(diǎn),均為上調(diào)基因(圖 2B)。
7 個(gè)核心候選基因的預(yù)后信息可在免費(fèi)的在線 Kaplan-Meier 繪圖儀數(shù)據(jù)庫(kù)中獲得。結(jié)果發(fā)現(xiàn),3 個(gè)基因的總體存活率明顯較差,而 4 個(gè)基因的總體存活率無(wú)顯著性差異(< 0.05,表 3 和圖 3)。隨后,利用 GEPIA 網(wǎng)站進(jìn)一步驗(yàn)證了總體存活率有顯著差異的 3 個(gè)基因在癌癥患者和正常人之間的表達(dá)狀況。結(jié)果表明,2 個(gè)基因(24 和)在乳腺癌()組織中的表達(dá)高于正常乳腺組織(< 0.01,圖 4)。高表達(dá)患者的生存率比低表達(dá)患者的更差。
表 3 7 個(gè)核心基因的預(yù)后信息
圖 3 核心基因的Kaplan-Meier 預(yù)后價(jià)值(Logrank P < 0.05 認(rèn)為有統(tǒng)計(jì)學(xué)意義)
Figure 3 Prognostic values of the key genes by Kaplan-Meier Plotter (Logrank< 0.05 was considered statistically significant)
圖 4 基因表達(dá)譜交互分析進(jìn)一步驗(yàn)證核心基因CD24(A)和EPCAM(B)在BRCA 標(biāo)本中的表達(dá)水平,并與正常標(biāo)本進(jìn)行對(duì)照(紅框表示癌組織組,灰色表示正常組織組,*P < 0.01;點(diǎn)表示每個(gè)樣本的表達(dá))
Figure 4 Gene expression profiling interactive analysis was performed to further demonstrate the genes' expression level of core genes24 (A) and(B) in BRCA samples contrasted to normal samples (Red box means the cancer tissue group, gray means the normal tissue group,*< 0.01. The dots represented expression in each sample)
乳腺癌是一種異質(zhì)性疾病,根據(jù)雌激素受體(ER)、孕激素受體(PR)、人表皮生長(zhǎng)因子受體2(HEGFR 2)和增殖標(biāo)志物 Ki67(MKI67)[8-10]的表達(dá)可分為四大分子亞型。目前,乳腺癌患者的預(yù)后預(yù)測(cè)主要基于這一分類和常規(guī)的臨床病理特征,如組織學(xué)分級(jí)、組織學(xué)類型和 TNM 分期。然而,在臨床實(shí)踐中,腫瘤的異質(zhì)性給治療效果和預(yù)后的預(yù)測(cè)帶來(lái)了極大的困難。近年來(lái),乳腺癌生物學(xué)過(guò)程中涉及的分子機(jī)制研究取得了很大進(jìn)展。隨著新批準(zhǔn)的基因治療策略的出現(xiàn)[11],增加的癌基因正在被測(cè)試為癌癥的治療靶點(diǎn)?;虻淖兓呀?jīng)被確認(rèn)在乳腺癌的發(fā)生和發(fā)展中起著關(guān)鍵作用[12-15]。因此,需要進(jìn)一步探索更有效的分子生物標(biāo)記物用于乳腺癌的預(yù)防、診斷和治療。
本研究利用 GEO 數(shù)據(jù)庫(kù)的基因芯片 GSE15852 進(jìn)行生物信息學(xué)分析乳腺癌組織與正常乳腺組織的差異基因,鑒定出 57 個(gè)差異表達(dá)基因,其中上調(diào)基因 17 個(gè),下調(diào)基因 40 個(gè)。通過(guò) GO 分析和 KEGG 通路分析,發(fā)現(xiàn)主要富集在細(xì)胞運(yùn)動(dòng)和脂質(zhì)代謝等通路。文獻(xiàn)報(bào)道脂質(zhì)對(duì)癌癥的影響已被廣泛研究,對(duì)不同的癌癥有不同的影響[16]。通過(guò) PPI 網(wǎng)絡(luò)及 MCODE 挖掘出 7 個(gè)核心基因,進(jìn)一步進(jìn)行生存分析和 GEPIA 分析,發(fā)現(xiàn) CD24 和 EPCAM 基因在癌癥中的表達(dá)高于正常組織,患者的生存率更低。CD24 是糖基-磷脂酰-肌醇連接的糖蛋白,在包括癌細(xì)胞在內(nèi)的多種細(xì)胞類型中表達(dá)。文獻(xiàn)表明,CD24 在多種腫瘤發(fā)生和發(fā)展中起作用,包括肺癌、前列腺癌、卵巢癌等[17]。已經(jīng)有報(bào)道表明24 的過(guò)表達(dá)對(duì)癌細(xì)胞中突變的 p53 蛋白的失活至關(guān)重要[18]。本研究們結(jié)果表明,24 是乳腺癌治療的關(guān)鍵基因,可以作為一種很有前途的治療靶點(diǎn)和預(yù)后標(biāo)志物,與文獻(xiàn)[19-20]報(bào)道一致。我們?cè)噲D確定細(xì)胞內(nèi)24 的位置,并確定位置是否影響腫瘤表型和患者預(yù)后,以便最終允許開(kāi)發(fā)最優(yōu)的24 定向治療。EPCAM 是一種跨膜糖蛋白,其過(guò)度表達(dá)被認(rèn)為與不同腫瘤的增殖增強(qiáng)和惡性程度有關(guān)[21]。Baccelli 等[22]和Sadeghi 等[23]報(bào)道,的過(guò)表達(dá)提高了乳腺癌患者的轉(zhuǎn)移率,提示EPCAM 可能是一個(gè)潛在的惡性腫瘤的生物標(biāo)志物,與本文結(jié)果一致。是乳腺癌中重要的過(guò)表達(dá)基因,可作為預(yù)后因素進(jìn)行評(píng)估。
綜上所述,本研究利用 GEO 數(shù)據(jù)集結(jié)合生物信息學(xué)的綜合分析,發(fā)現(xiàn) 2 個(gè)核心基因與 BRCA 的進(jìn)展和預(yù)后相關(guān),可能會(huì)為 BRCA 潛在的生物標(biāo)志物和生物學(xué)機(jī)制提供一些有用的信息和方向,為制定有效的診斷和治療策略提供參考。
[1] Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. A Cancer J Clin, 2018, 68(6):394-424.
[2] Miller KD, Nogueira L, Mariotto AB, et al. Cancer treatment and survivorship statistics, 2019. A Cancer J Clin, 2019, 69(5):363-385.
[3] Zheng S, Li M, Miao K, et al. SNHG1 contributes to proliferation and invasion by regulating miR-382 in breast cancer. Cancer Manag Res, 2019, 11:5589-5598.
[4] Song E, Song W, Ren M, et al. Identification of potential crucial genes associated with carcinogenesis of clear cell renal cell carcinoma.J Cell Biochem, 2018, 119(7):5163-5174.
[5] Li S, Wang L, Meng Y, et al. Increased levels of LAPTM4B, VEGF and survivin are correlated with tumor progression and poor prognosis in breast cancer patients. Oncotarget, 2017, 8(25):41282-41293.
[6] Alshareeda AT, Negm OH, Green AR, et al. KPNA2 is a nuclear export protein that contributes to aberrant localisation of key proteins and poor prognosis of breast cancer. Br J Cancer, 2015, 112(12):1929- 1937.
[7] Buhmeida A, Merdad A, Al-Maghrabi J, et al. RASSF1A methylation is predictive of poor prognosis in female breast cancer in a background of overall low methylation frequency. Anticancer Res, 2011, 31(9):2975-2981.
[8] Sorlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A, 2003, 100(14):8418-8423.
[9] S?rlie T, Wang Y, Xiao C, et al. Distinct molecular mechanisms underlying clinically relevant subtypes of breast cancer: gene expression analyses across three different platforms. BMC Genomics, 2006, 7:127.
[10] Goldhirsch A, Wood WC, Coates AS, et al. Strategies for subtypes-dealing with the diversity of breast cancer: highlights of the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol, 2011, 22(8):1736-1747.
[11] Stein CA, Castanotto D. FDA-approved oligonucleotide therapies in 2017. Mol Ther, 2017, 25(5):1069-1075.
[12] Buys SS, Sandbach JF, Gammon A, et al. A study of over 35,000 women with breast cancer tested with a 25-gene panel of hereditary cancer genes. Cancer, 2017, 123(10):1721-1730.
[13] Li G, Guo X, Tang L, et al. Analysis of BRCA1/2 mutation spectrum and prevalence in unselected Chinese breast cancer patients by next-generation sequencing. J Cancer Res Clin Oncol, 2017, 143(10): 2011-2024.
[14] Cybulski C, Carrot-Zhang J, Klu?niak W, et al. Germline RECQL mutations are associated with breast cancer susceptibility. Nat Genet, 2015, 47(6):643-646.
[15] Dan X, Pu C, Mengjiao H, et al. An integrated bioinformatical analysis to evaluate the role of KIF4A as a prognostic biomarker for breast cancer. Onco Targets Ther, 2018, 11:4755-4768.
[16] Zhang J, Zhou YJ, Yu ZH, et al. Identification of core genes and clinical roles in pregnancy-associated breast cancer based on integrated analysis of different microarray profile datasets. Biosci Rep, 2019, 39(6):BSR20190019.
[17] Lee JH, Kim SH, Lee ES, et al. CD24 overexpression in cancer development and progression: a meta-analysis. Oncol Rep, 2009, 22(5):1149-1156.
[18] Wang L, Liu R, Ye P, et al. Intracellular CD24 disrupts the ARF–NPM interaction and enables mutational and viral oncogene-mediated p53 inactivation. Nat Commun, 2015, 6:5909-5919.
[19] Suyama K, Onishi H, Imaizumi A, et al. CD24 suppresses malignant phenotype by downregulation of SHH transcription through STAT1 inhibition in breast cancer cells. Cancer Lett, 2016, 374(1):44-53.
[20] Zhang P, Zheng P, Liu Y. Amplification of the CD24 gene is an independent predictor for poor prognosis of breast cancer.Front Genet, 2019, 10:560.
[21] van der Gun BT, Melchers LJ, Ruiters MHJ, et al. EpCAM in carcinogenesis: the good, the bad or the ugly. Carcinogenesis, 2010, 31(11):1913-1921.
[22] Baccelli I, Schneeweiss A, Riethdorf S, et al. Identification of a population of blood circulating tumor cells from breast cancer patients that initiates metastasis in a xenograft assay. Nat Biotechnol, 2013, 31(6):539-544.
[23] Sadeghi S, Hojati Z, Tabatabaeian H. Cooverexpression of EpCAM and c-myc genes in malignant breast tumours. J Genet, 2017, 96(1): 109-118.
Identification of key genes with poor prognosis in breast cancer by bioinformatical analysis
CAO Lin, HAN Li, YANG Ming, HAN Bin, LIU Fu
Department of Pharmacy, Affiliated Hospital of North Sichuan Medical College, Nanchong 637000, China
Bioinformatic method was used to analyze the gene expression database (GEO) gene chip data to screen core genes related to poor prognosis of breast cancer and to provide a new candidate target for the treatment of breast cancer.
Microarray dataset from GEO database GSE15852 was downloaded and differentially expressed genes (DEGs) were screened using GEO online tool GEO2R. Next, Gene Ontology and Kyoto Encyclopedia of Gene and Genome pathway enrichment analyses were performed using the online databases DAVID for selected DEGs. Protein-protein interaction relationship (PPI) was constructed based on STRING database and Cytoscape software, and the key genes were obtained by module analysis with MCODE plug-in of Cytoscape software. Then, overall survival analysis was performed using the Kaplan–Meier curve to screen core genes related to the prognosis of breast cancer, and the genes were further validated in gene expression profiling interactive analysis (GEPIA).
57 DEGs were identified in BRCA in the dataset, including 17 up-regulated genes largely enriched in the response to estrogen, negative regulation of cell motility, cardiac right ventricle morphogenesis, sympathetic nervous system development, cell-cell adhesion, viral process, ureteric bud development biological processes and hematopoietic cell lineage signaling pathway, and 40 down-regulated genes specifically enriched in lipid metabolic process, lipid transport, lipid storage, positive regulation of cholesterol storage, cholesterol transport, triglyceride catabolic process, angiogenesis biological processes, PPAR signaling pathway, regulation of lipolysis in adipocytes, and adipocytokine signaling pathway. By extracting key modules from the PPI network by Cytotype MCODE plugin, all 7 up-regulated genes were selected. In addition, survival analysis and gene expression profiling interactive analysis of key genes showed that the survival rate of patients with high expression of24 andgenes was lower than that of patients with low expression.
This study provides a basis for finding the key genes with poor prognosis of breast cancer and exploring new targets for the treatment of breast cancer.
Breast cancer; Bioinformatics; GEO database; Differentially expressed genes; Key genes
LIU Fu, Email: nclf91@163.com
四川省教育廳項(xiàng)目(18ZB0219)
劉福,Email:nclf91@163.com
10.3969/j.issn.1673-713X.2020.04.012
2020-02-26