遗传

• 研究报告 •    

基于癌症基因组图谱计划多组学数据构建胶质母细胞瘤六基因预后模型

雷常贵,贾学渊,孙文靖   

  1. 哈尔滨医科大学医学遗传教研室
  • 收稿日期:2020-12-11 修回日期:2021-04-04 出版日期:2021-04-07 发布日期:2021-04-07
  • 通讯作者: 孙文靖
  • 基金资助:
    教育部创新团队

Establish six-gene prognostic model for glioblastoma based on multi-omics data of TCGA database

lei changgui1, sun wenjing1   

  • Received:2020-12-11 Revised:2021-04-04 Online:2021-04-07 Published:2021-04-07
  • Contact: sun wenjing
  • Supported by:
    Program for Changjiang Scholars and Innovative Research Team in University

摘要: 胶质母细胞瘤(glioblastoma,GBM)是最常见的原发性颅内肿瘤,恶性程度极高,患者预后极差。为了识别GBM预后生物标记物,建立预后模型,本研究通过分析癌症基因组图谱计划(The Cancer Genome Atlas, TCGA)数据库中GBM的表达谱数据,筛选出不同生存期GBM患者差异基因。利用GISTIC软件和Kaplan-Meier(KM)生存分析方法分析TCGA数据库中的GBM拷贝数变异数据,识别影响生存的扩增基因(survival-associated amplified gene, SAG)。取短生存期组上调基因和SAG两者的交集基因,进行单因素Cox回归和迭代Lasso回归筛选重要候选基因并建立预后模型;计算预后评分,根据预后评分中位数将患者分为高风险组和低风险组。用ROC曲线判断模型的优良,KM生存分析高低风险组预后差异,并用GEO、CGGA和Rembrandt数据库3个外部数据集进行验证。多因素Cox回归分析判断预后评分的预后独立性。结果显示,GBM不同生存期差异分析得到上调基因426个,下调基因65个。短生存期组上调基因与SAG交集得到47个基因。经过筛选,最终确定六基因(EN2、PPBP、LRRC61、SEL1L3、CPA4、DDIT4L)预后模型。TCGA实验组和3个外部验证组模型的ROC曲线下面积均大于0.6,甚至达到0.912。KM分析显示高低风险组的预后都存在差异(P<0.05)。在多因素Cox回归分析中,六基因预后评分是GBM患者预后的独立影响因素(P<0.05)。通过一系列分析,本研究确立了六基因(EN2、PPBP、LRRC61、SEL1L3、CPA4、DDIT4L)的GBM预后模型,模型具有很好的预测能力,可作为预测GBM患者的独立预后标志物。

关键词: 胶质母细胞瘤, 多组学数据, 六基因组合, 预后模型, 癌症基因组图谱计划

Abstract: Glioblastoma (GBM) is the most common primary intracranial tumor with extremely high malignancy and poor prognosis. In order to identify the GBM prognostic biomarkers and establish a prognostic model, we analyzed the expression profile data of GBM in The Cancer Genome Atlas (TCGA) database as the experimental group. First, we identified the differentially expressed genes of different survival periods among the GBM patients. The GISTIC software and Kaplan Meier (KM) survival curve were used to analyze the copy number variation of GBM to identify the survival-associated amplified gene (SAG). We selected the intersection genes of up-regulated ones in short survival group and SAG, performed univariate Cox regression and iterative Lasso regression with them to identify the important candidate genes and establish a prognostic model. Based on the model, the prognostic score was calculated. The patients were divided into high-risk and low-risk groups according to the median prognostic score. Meanwhile ROC curve was used to evaluate the validity of the model, applying the KM survival analysis of the high-risk and low-risk groups. Multivariate Cox regression analysis was used to determine the independence of the prognostic score. All the data were verified with three external datasets: GEO GSE16011, CGGA, and Rembrandt. The results showed that differential expression analysis of different survival periods of GBM identified 426 up-regulated genes and 65 down-regulated genes in the TCGA GBM dataset. The intersection of up-regulated genes in short survival group and SAG yielded 47 genes. After the screening, the six-gene combination (EN2, PPBP, LRRC61, SEL1L3, CPA4, DDIT4L) prognostic model was finally determined. The area under ROC curve of the model in TCGA experimental group and three external validation group were all greater than 0.6, even reaching 0.912. KM analysis showed that the prognosis of the high-risk and low-risk groups was significant different (P<0.05). In the multivariate Cox regression analysis, the six-gene prognostic score was an independent factor influencing the prognosis of GBM patients (P<0.05). In summary, this study established a prognostic model of six-gene (EN2, PPBP, LRRC61, SEL1L3, CPA4, DDIT4L) for GBM. This six-gene model has good predictive ability and could be used as an independent prognostic marker for GBM patients.

Key words: glioblastoma, multi-omics data, six-gene combination, prognostic model, The Cancer Genome Atlas