遗传 ›› 2024, Vol. 46 ›› Issue (9): 701-715.doi: 10.16288/j.yczz.24-151

• 综述 • 上一篇    下一篇

深度学习在基因组学中的研究进展

鲍艳春1,2(), 石彩霞1, 张传强3,4, 谷明娟1, 朱琳1, 刘在霞1,2, 周乐1,2, 马凤英1,2, 娜日苏1(), 张文广5()   

  1. 1.内蒙古农业大学动物科学学院,呼和浩特 010018
    2.农业基因组大数据内蒙古自治区工程研究中心,呼和浩特 010018
    3.内蒙古赛科星家畜种业与繁育生物技术研究院有限公司,呼和浩特 011517
    4.国家乳业技术创新中心,呼和浩特 010080
    5.内蒙古农业大学生命科学学院,呼和浩特 010018
  • 收稿日期:2024-05-27 修回日期:2024-08-09 出版日期:2024-09-20 发布日期:2024-08-16
  • 通讯作者: 娜日苏,博士,副教授,研究方向:牛羊遗传育种与繁殖。E-mail: narisu@swu.edu.cn
    张文广,博士,教授,研究方向:数量基因组学与生物信息学。E-mail: atcgnmbi@aliyun.com.
  • 作者简介:鲍艳春,博士研究生,专业方向:动物遗传育种与繁殖。E-mail: byc107054@163.com
  • 基金资助:
    内蒙古自治区自然科学基金项目(2021ZD05);国家乳业技术创新中心项目子课题(2023-JSGG-2);内蒙古自治区直属高校基本科研业务费(BR221024);内蒙古自治区直属高校基本科研业务费(BR221113);内蒙古农业大学双一流建设资金(BZX202201);内蒙古农业大学双一流建设资金(QF202206);内蒙古农业大学双一流建设资金(NDYB2022-1)

Progress on deep learning in genomics

Yanchun Bao1,2(), Caixia Shi1, Chuanqiang Zhang3,4, Mingjuan Gu1, Lin Zhu1, Zaixia Liu1,2, Le Zhou1,2, Fengying Ma1,2, Risu Na1(), Wenguang Zhang5()   

  1. 1. College of Animal Science and Technology, Inner Mongolia Agricultural University, Hohhot 010018, China
    2. Inner Mongolia Engineering Research Center of Genomic Big Data for Agriculture, Hohhot 010018, China
    3. Inner Mongolia Saikexing Institute of Breeding and Reproductive Biotechnology in Domestic Animal, Hohhot 011517, China
    4. National Center of Technology Innovation for Dairy Industry, Hohhot 010080, China
    5. College of Life Sciences, Inner Mongolia Agricultural University, Hohhot 010021, China
  • Received:2024-05-27 Revised:2024-08-09 Published:2024-09-20 Online:2024-08-16
  • Supported by:
    Natural Science Foundation of Inner Mongolia Autonomous Region(2021ZD05);Sub-theme of the National Dairy Technology Innovation Center Project(2023-JSGG-2);Basic Research Expenses of Universities Directly of Inner Mongolia Autonomous Region(BR221024);Basic Research Expenses of Universities Directly of Inner Mongolia Autonomous Region(BR221113);Funding for Double First-class Construction of Inner Mongolia Agricultural University(BZX202201);Funding for Double First-class Construction of Inner Mongolia Agricultural University(QF202206);Funding for Double First-class Construction of Inner Mongolia Agricultural University(NDYB2022-1)

摘要:

随着高通量测序技术的迅猛发展,基因组学领域迎来了数据量的爆炸性增长,这对传统生物信息学处理复杂数据模式的能力构成了严峻挑战。在此技术革新的关键时刻,深度学习作为人工智能领域的前沿技术,以其强大的数据解析与模式识别能力,为基因组学研究注入了新的活力。本文聚焦于4种核心深度学习模型——卷积神经网络(convolution neural network,CNN)、循环神经网络(recurrent neural network,RNN)、长短期记忆网络(long short term memory,LSTM)及生成对抗网络(generative adversarial network,GAN),系统阐述了它们的基础原理,重点回顾了这些模型近5年在DNA、RNA和蛋白质研究领域的广泛应用。此外,文章进一步探讨了深度学习在畜禽基因组学中的应用案例,揭示了其在遗传特征解析、疾病预防以及遗传改良等领域的潜在应用价值与面临的挑战。通过深入分析,本文旨在阐述深度学习技术在增强基因组数据分析的准确性和处理能力方面的作用,并构建一个概念性框架,以指导畜禽基因组学研究策略的发展及其在具体场景下的应用,进而推动精准农业和遗传改良技术的发展。

关键词: 深度学习, 基因组, 卷积神经网络, 循环神经网络, 长短期记忆网络, 生成对抗网络

Abstract:

With the rapid growth of data driven by high-throughput sequencing technologies, genomics has entered an era characterized by big data, which presents significant challenges for traditional bioinformatics methods in handling complex data patterns. At this critical juncture of technological progress, deep learning—an advanced artificial intelligence technology—offers powerful capabilities for data analysis and pattern recognition, revitalizing genomic research. In this review, we focus on four major deep learning models: Convolutional Neural Network(CNN), Recurrent Neural Network(RNN), Long Short-Term Memory(LSTM), and Generative Adversarial Network(GAN). We outline their core principles and provide a comprehensive review of their applications in DNA, RNA, and protein research over the past five years. Additionally, we also explore the use of deep learning in livestock genomics, highlighting its potential benefits and challenges in genetic trait analysis, disease prevention, and genetic enhancement. By delivering a thorough analysis, we aim to enhance precision and efficiency in genomic research through deep learning and offer a framework for developing and applying livestock genomic strategies, thereby advancing precision livestock farming and genetic breeding technologies.

Key words: deep learning, genome, CNN, RNN, LSTM, GAN