遗传 ›› 2023, Vol. 45 ›› Issue (10): 922-932.doi: 10.16288/j.yczz.23-120

• 研究报告 • 上一篇    下一篇

基于机器学习的猪生长性状基因组预测

陈栋1,2,3(), 王书杰1,2,3, 赵真坚1,2,3, 姬祥1,2,3, 申琦1,2,3, 余杨1,2,3, 崔晟頔1,2,3, 王俊戈1,2,3, 陈子旸1,2,3, 王金勇4, 郭宗义4, 吴平先4, 唐国庆1,2,3()   

  1. 1.四川农业大学动物科技学院,农业农村部畜禽生物组学重点实验室,成都 611130
    2.四川农业大学,畜禽遗传资源发掘与创新利用四川省重点实验室,成都 611130
    3.四川农业大学动物科技学院,猪禽种业全国重点实验室,成都 611130
    4.国家生猪技术创新中心,重庆 402460
  • 收稿日期:2023-04-26 修回日期:2023-08-14 出版日期:2023-10-20 发布日期:2023-08-16
  • 通讯作者: 唐国庆 E-mail:1123278154@qq.com;tyq003@163.com
  • 作者简介:陈栋,硕士,专业方向:畜牧学。E-mail: 1123278154@qq.com
  • 基金资助:
    国家生猪技术创新中心先导科技项目(NCTIP-XD/B01);四川省科技厅项目(2020YFN0024);四川省科技厅项目(2021ZDZX0008);四川省科技厅项目(2021YFYZ0030);四川省猪创新团队项目(sccxtd-2022-08)

Genomic prediction of pig growth traits based on machine learning

Dong Chen1,2,3(), Shujie Wang1,2,3, Zhenjian Zhao1,2,3, Xiang Ji1,2,3, Qi Shen1,2,3, Yang Yu1,2,3, Shengdi Cui1,2,3, Junge Wang1,2,3, Ziyang Chen1,2,3, Jinyong Wang4, Zongyi Guo4, Pingxian Wu4, Guoqing Tang1,2,3()   

  1. 1. Key Laboratory of Livestock and Poultry Multi-omics, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
    2. Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China
    3. State Key Laboratory of Swine and Poultry Breeding Industry, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
    4. National Center of Technology Innovation for Pigs, Chongqing 402460, China
  • Received:2023-04-26 Revised:2023-08-14 Published:2023-10-20 Online:2023-08-16
  • Contact: Guoqing Tang E-mail:1123278154@qq.com;tyq003@163.com
  • Supported by:
    Strategic Priority Research Program of the National Center of Technology Innovation for Pigs(NCTIP-XD/B01);Sichuan Science and Technology Program(2020YFN0024);Sichuan Science and Technology Program(2021ZDZX0008);Sichuan Science and Technology Program(2021YFYZ0030);Sichuan Innovation Team of Pig(sccxtd-2022-08)

摘要:

为了比较自动机器学习下不同机器学习模型预测部分猪生长性状与全基因组估计育种值(genomic estimated breeding value,GEBV)的性能,并寻找适合的机器学习模型,以优化生猪育种的全基因组评估方法,本研究利用来自多个公司9968头猪的基因组信息、系谱矩阵、固定效应及表型信息通过自动机器学习方法获取深度学习(deep learning,DL)、随机森林(random forest,RF)、梯度提升机(gradient boosting machine,GBM)和极致梯度提升(extreme gradient boosting,XGB)4种机器学习最佳模型。采用10折交叉验证分别对猪达100 kg校正背膘(correcting backfat to 100 kg,B100)、达115 kg校正背膘(correcting backfat to 115 kg,B115)、达100 kg校正日龄(correcting days to 100 kg,D100)、达115 kg校正日龄(correcting days to 100 kg,D115)的GEBV及其表型进行预测,比较不同机器学习模型应用于猪基因组评估的性能。结果表明:机器学习模型对GEBV的估计准确性高于性状表型;在GEBV预测中,GBM在B100、B115、D100、D115的预测准确性分别为0.683、0.710、0.866、0.871,略高于其他方法;在表型预测中,对猪B100、B115、D100、D115预测性能最好的模型依次为GBM(0.547)、DL(0.547)、XGB(0.672、0.670);在模型训练所需时间上,RF远高于其他3种模型,GBM与DL居中,XGB所需时间最少。综上所述,通过自动机器学习获取的机器学习模型对GEBV预测的准确性高于表型;GBM模型总体上表现出最高的预测准确性与较短训练时间;XGB能够利用最短的时间训练准确性较高的预测模型;RF模型的训练时间远超其他3种模型,且准确性不足,不适用猪生长性状表型与GEBV预测。

关键词: 基因组估计育种值, 生长性状, 自动机器学习, 性能比较

Abstract:

This study aimed to assess and compare the performance of different machine learning models in predicting selected pig growth traits and genomic estimated breeding values (GEBV) using automated machine learning, with the goal of optimizing whole-genome evaluation methods in pig breeding. The research employed genomic information, pedigree matrices, fixed effects, and phenotype data from 9968 pigs across multiple companies to derive four optimal machine learning models: deep learning (DL), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGB). Through 10-fold cross-validation, predictions were made for GEBV and phenotypes of pigs reaching weight milestones (100 kg and 115 kg) with adjustments for backfat and days to weight. The findings indicated that machine learning models exhibited higher accuracy in predicting GEBV compared to phenotypic traits. Notably, GBM demonstrated superior GEBV prediction accuracy, with values of 0.683, 0.710, 0.866, and 0.871 for B100, B115, D100, and D115, respectively, slightly outperforming other methods. In phenotype prediction, GBM emerged as the best-performing model for pigs with B100, B115, D100, and D115 traits, achieving prediction accuracies of 0.547, followed by DL at 0.547, and then XGB with accuracies of 0.672 and 0.670. In terms of model training time, RF required the most time, while GBM and DL fell in between, and XGB demonstrated the shortest training time. In summary, machine learning models obtained through automated techniques exhibited higher GEBV prediction accuracy compared to phenotypic traits. GBM emerged as the overall top performer in terms of prediction accuracy and training time efficiency, while XGB demonstrated the ability to train accurate prediction models within a short timeframe. RF, on the other hand, had longer training times and insufficient accuracy, rendering it unsuitable for predicting pig growth traits and GEBV.

Key words: genomic estimated breeding values, growth traits, automated machine learning, performance comparison