遗传 ›› 2014, Vol. 36 ›› Issue (4): 387-394.doi: 10.3724/SP.J.1005.2014.0387

• 技术与方法 • 上一篇    下一篇

基于蛋白质互作知识的生物学通路扩充新方法

赵小蕾1, 左晓宇2, 覃继恒1, 梁岩3, 张乃尊3,栾奕昭1, 饶绍奇1,2   

  1. 1. 广东医学院医学系统生物学研究所与公共卫生学院, 东莞 523808; 
    2. 中山大学公共卫生学院, 广州 510080; 
    3. 茂名市人民医院, 茂名 525000
  • 收稿日期:2013-09-23 修回日期:2013-12-04 出版日期:2014-04-20 发布日期:2014-03-26
  • 通讯作者: 饶绍奇,教授,博士生导师,研究方向: 遗传统计与生物信息学。E-mail: raoshaoq@gdmc.edu.cn E-mail:raoshaoq@gdmc.edu.cn
  • 作者简介:赵小蕾,助理实验师,研究方向: 生物信息学。E-mail: zhaoxiaolei0715@163.com
  • 基金资助:

    全基因组复杂疾病遗传通路分析方法研究;TNFalpha和NOD通路基因与糖尿病向冠心病演化的关联性及过渡分子标记鉴定;冠心病早期诊断和个性化医疗分子标记的研究;冠心病早期诊断及临床基因组学模型的研究;复杂疾病系统生物学研究;冠心病易感基因与早期诊断分子标志物;流行病与卫生统计学重点学科;基于通路的人类复杂疾病遗传异质性研究;基于数据采矿的基因互作分析软件的开发

A novel biological pathway expansion method based on the knowl-edge of protein-protein interactions

Xiaolei Zhao1, Xiaoyu Zuo2, Jiheng Qin1, Yan Liang3, Naizun Zhang3, Yizhao Luan1, Shaoqi Rao1,2   

  1. 1. Institute for Medical Systems Biology and School of Public Health, Guangdong Medical College, Dongguan 523808, China; 
    2. School of Public Health, Sun Yat-Sen University, Guangzhou 510080, China; 
    3. Maoming People’s Hospital, Maoming 525000, China
  • Received:2013-09-23 Revised:2013-12-04 Online:2014-04-20 Published:2014-03-26
  • Supported by:

    protein?protein interaction|Gene Ontology|enrichment analysis|pathway attribution|prediction

摘要:

生物学通路被广泛应用于基因功能学研究, 但现有的生物学通路知识并不完善, 仍需进一步扩充。生物信息学预测为通路扩充提供了一种有效且经济的途径。文章提出了一种融合蛋白质-蛋白质互作知识以及Gene Ontology(GO)数据库信息进行基因通路预测的新方法。首先选取目标基因在蛋白质-蛋白质互作层面上的邻居所在的Kyoto Encyclopedia of Genes and Genomes(KEGG)通路为候选通路, 然后通过检验候选通路中的基因是否在与目标基因关联的GO节点富集来判断目标基因的通路归属。分别利用Human Protein Reference Database (HPRD)和Biological General Repository for Interaction Datasets(BioGRID)数据库中的蛋白质-蛋白质互作信息进行预测。结果表明, 在两套数据中, 随着互作邻居个数的增加, 预测的平均准确率(在所有目标基因注释的通路中被成功预测的比例)及相对准确率(在至少有一个注释通路被成功预测的基因集中, 所有注释通路均被预测正确的基因所占的比例)均呈现上升趋势。当互作邻居个数达到22时, 预测的平均准确率分别达到96.2%(HPRD)和96.3%(BioGRID), 而相对准确率分别为93.3%(HPRD)和84.1%(BioGRID)。进一步利用新版数据库对旧版数据库中被更新的89个基因进行验证, 至少有一个更新通路被预测正确的基因有50个, 其中43个基因的更新通路被完全正确预测, 相对准确率为86.0%。这些结果显示该方法是一种可靠且有效的通路扩充方法。

关键词: 蛋白质-蛋白质互作, 基因本体论, 富集分析, 通路归属, 预测

Abstract:

Biological pathways have been widely used in gene function studies; however, the current knowledge for biological pathways is per se incomplete and has to be further expanded. Bioinformatics prediction provides us a cheap but effective way for pathway expansion. Here, we proposed a novel method for biological pathway prediction, by intergrating prior knowledge of protein?protein interactions and Gene Ontology (GO) database. First, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to which the interacting neighbors of a targe gene (at the level of protein?protein interaction) belong were chosen as the candidate pathways. Then, the pathways to which the target gene belong were determined by testing whether the genes in the candidate pathways were enriched in the GO terms to which the target gene were annotated. The protein?protein interaction data obtained from the Human Protein Reference Database (HPRD) and Biological General Repository for Interaction Datasets (BioGRID) were respectively used to predict the pathway attribution(s) of the target gene. The results demanstrated that both the average accuracy (the ratio of the correctly predicted pathways to the totally pathways to which all the target genes were annotated) and the relative accuracy (of the genes with at least one annotated pathway being successful predicted, the percentage of the genes with all the annotated pathways being correctly predicted) for pathway predictions were increased with the number of the interacting neighbours. When the number of interacting neighbours reached 22, the average accuracy was 96.2% (HPRD) and 96.3% (BioGRID), respectively, and the relative accuracy was 93.3% (HPRD) and 84.1% (BioGRID), respectively. Further validation analysis of 89 genes whose pathway knowledge was updated in a new database release indicated that 50 genes were correctly predicted for at least one updated pathway, and 43 genes were accurately predicted for all the updated pathways, giving an estimate of the relative accuracy of 86.0%. These results demonstrated that the proposed approach was a reliable and effective method for pathway expansion.