遗传 ›› 2019, Vol. 41 ›› Issue (11): 1041-1049.doi: 10.16288/j.yczz.19-155

• 研究报告 • 上一篇    下一篇

基于迁移学习的MHC-I型抗原表位呈递预测

胡伟澎1,2,3,李佑平2,3,4,张秀清2,3,4()   

  1. 1. 华南理工大学生物科学与工程学院,广州 510006
    2. 深圳华大生命科学研究院,深圳 518083
    3. 华大吉诺因,武汉 4300794
    4. 中国科学院大学华大教育中心,深圳 518083
  • 收稿日期:2019-06-21 修回日期:2019-09-17 出版日期:2019-11-20 发布日期:2019-10-08
  • 通讯作者: 张秀清 E-mail:zhangxq@genomics.cn
  • 作者简介:胡伟澎,硕士研究生,专业方向:基因组学。E-mail: huweipeng@genomics.cn|李佑平,硕士研究生,专业方向:基因组学。E-mail: liyouping@genomics.cn
    胡伟澎和李佑平并列第一作者。
  • 基金资助:
    国家自然科学基金项目编号:(81702826);国家自然科学基金项目编号:(81772910);深圳市科创委项目编号:(JCYJ20170303151334808);深圳市经信委项目资助编号:(20170731162715261)

MHC-I epitope presentation prediction based on transfer learning

Weipeng Hu1,2,3,Youping Li2,3,4,Xiuqing Zhang2,3,4()   

  1. 1. School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
    2. BGI-Shenzhen, Shenzhen 518083, China
    3. BGI-GenoImmune, Wuhan 4300794, China
    4. BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China
  • Received:2019-06-21 Revised:2019-09-17 Online:2019-11-20 Published:2019-10-08
  • Contact: Zhang Xiuqing E-mail:zhangxq@genomics.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China Nos.(81702826);Supported by the National Natural Science Foundation of China Nos.(81772910);Science, Technology and Innovation Commission of Shenzhen Municipality No.(JCYJ20170303151334808);and Shenzhen Municipal Government of China No.(20170731162715261)

摘要:

基于新抗原的肿瘤免疫治疗,抗原呈递的准确预测是筛选T细胞特异性表位的关键步骤。质谱鉴定的表位数据对建立抗原呈递预测模型具有重要价值。尽管近年来质谱数据的积累持续增加,但是大部分人类白细胞抗原(human leukocyte antigen, HLA)分型所对应的多肽数量相对较少,无法建立可靠的预测模型。为此,本研究尝试利用迁移学习的方法,先利用混合分型的表位数据建立模型以识别抗原表位的共同特征,在此预训练模型的基础上再利用分型特异性数据建立抗原呈递预测模型Pluto。在相同的验证集上,Pluto的平均0.1%阳性预测值(positive predictive value, PPV)比从头训练的模型高0.078。在外部的质谱数据独立评估上,Pluto的平均0.1% PPV为0.4255,高于从头训练模型(0.3824)和其他主流工具,包括MixMHCpred (0.3369)、NetMHCpan4.0-EL (0.4000)、NetMHCpan4.0-BA (0.3188)和MHCflurry (0.3002)。此外,在免疫原性预测评估上,Pluto相对于其他工具也能找到更多的新抗原。Pluto开源网址:https://github.com/weipenegHU/Pluto。

关键词: 免疫治疗, 新抗原, 抗原呈递, 深度学习, 迁移学习

Abstract:

Accurate epitope presentation prediction is a key procedure in tumour immunotherapies based on neoantigen for targeting T cell specific epitopes. Epitopes identified by mass spectrometry (MS) is valuable to train an epitope presentation prediction model. In spite of the accelerating accumulation of MS data, the number of epitopes that match most of human leukocyte antigens (HLAs) is relatively small, which makes it difficult to build a reliable prediction model. Therefore, this research attempted to use the transfer learning method to train a model to learn common features among the mixed allele specific epitopes. Then based on this pre-trained model, we used the allele-specific epitopes to train the final epitope presentation prediction model, termed Pluto. The average 0.1% positive predictive value (PPV) of Pluto outperformed the prediction model without pretraining with a margin of 0.078 on the same validation dataset. When evaluating Pluto on external HLA eluted ligand datasets, Pluto achieved an averaged 0.1% PPV of 0.4255, which is better than the prediction model without pretraining (0.3824) and other popular methods, including MixMHCpred (0.3369), NetMHCpan4.0-EL (0.4000), NetMHCpan4.0-BA (0.3188) and MHCflurry (0.3002). Moreover, when it comes to the evaluation of predicting immunogenicity, Pluto can identify more neoantigens than other tools. Pluto is publicly available at https://github.com/weipenegHU/Pluto.

Key words: immunotherapy, neoantigen, epitope presentation, deep learning, transfer learning