遗传 ›› 2018, Vol. 40 ›› Issue (4): 305-314.doi: 10.16288/j.yczz.17-394

• 综述 • 上一篇    下一篇

SNP芯片数据估计动物个体基因组品种构成的方法及应用

何俊1,3,钱长嵩2,RichardG.TaitJr.3,StewartBauck3,吴晓林1,3,4   

  1. 1. 湖南农业大学动物科技学院,长沙 410128
    2. 纽勤生物科技(上海)有限公司,上海 200072
    3. 美国纽勤公司生物信息与生物统计部,内布拉斯加州林肯市 68504
    4. 美国威斯康星大学动物科学系,威斯康星州麦迪逊市 53706
  • 收稿日期:2017-12-04 修回日期:2018-02-27 出版日期:2018-04-20 发布日期:2018-04-04
  • 作者简介:何俊,博士,副教授,研究方向:动物遗传育种与繁殖。E-mail: hejun@hunau.edu.cn
  • 基金资助:
    湖南省百人计划项目和湖南省畜禽安全生产协同创新中心项目资助

Estimating genomic breed composition of individual animals using selected SNPs

Jun He1,3,Changsong Qian2,Richard G. Tait Jr.3,Stewart Bauck3,Xiaolin Wu1,3,4   

  1. 1. College of Animal Science and Technology, Hunan Agricultural University, Changsha 410128, China
    2. Neogen Bio-Scientific Technology (Shanghai) Co., Ltd, Shanghai 200072, China
    3. Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE 68504, USA
    4. Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA
  • Received:2017-12-04 Revised:2018-02-27 Online:2018-04-20 Published:2018-04-04
  • Supported by:
    Supported by Hundred-Talent Project of Hunan Province and Hunan Innovation Center of Animal Safety Production

摘要:

自然和人工选择、地理隔离和遗传漂移等原因使动物基因组中许多位点的等位基因频率在群体间会产生差异。源于不同品种(祖先)杂交(交配)的动物个体,其基因组与这些品种(祖先)的基因频率(基因型)会存在一定的相关性。因此采用合适的统计模型和分析方法,可以估计出每个品种(祖先)对于个体基因组的遗传贡献比例,又称为个体的基因组品种构成(genomic breed composition, GBC)。本文介绍了利用SNP芯片数据估计动物个体GBC的原理、方法及步骤,并且通过对198头待鉴定的日本红毛和牛GBC的评估,演示了用回归模型和混合分布模型估计动物个体GBC的具体步骤,其中包括SNP子集的筛选、参考群体中动物个体选择以及待测定动物GBC的计算。参考动物群体选自日本红毛和牛(Akaushi)、安格斯牛(Angus)、海福特牛(Hereford)、荷斯坦牛(Holstein)和娟珊牛(Jersey) 5个品种共36 574头,每个个体有40K或50K芯片数据。本文在现有商用 SNP芯片基础上筛选用于品种鉴定和估计动物个体GBC的SNP子集,是对现有SNP芯片功能的拓展和深入开发利用。此外,在基因组选择中如何利用SNP基因型估计动物个体GBC的结果,提高纯种和杂种动物的预测准确度,也是值得深入研究的领域。

关键词: 基因组品种构成, 回归模型, 混合分布模型, 基因组预测, SNP芯片

Abstract:

Natural and artificial selection, geographical segregation and genetic drift can result in differentiation of allelic frequencies of single nucleotide polymorphism (SNP) at many loci in the animal genome. For individuals whose ancestors originated from different populations, their genetic compositions exhibit multiple components correlated with the genotypes or allele frequencies of these breeds or populations. Therefore, by using an appropriate statistical method, one can estimate the genomic contribution of each breed (ancestor) to the genome of each individual animal, which is referred to as the genomic breed composition (GBC). This paper reviews the principles, statistical methods and steps for estimating GBC of individual animals using SNP genotype data. Based on a linear regression model and an admixture model respectively, the protocols were demonstrated by the breed characterization of 198 purported Akaushi cattle, which included selection of reference SNPs and reference individual animals, and computing GBC for animals to be evaluated. The reference populations consist of 36 574 cattle from five cattle breeds (Akaushi, Angus, Hereford, Holstein and Jersey), each genotyped on either a 40K or 50K SNP chip. Four common SNP panels scanned from commercial chips for estimating GBC of individual animals are optimally selected, thereby expanding the functionalities of the currently available commercial SNP chips. It remains to be explored in future studies as to how estimated GBC can be incorporated to improve the accuracies on genomic prediction in purebred animals and crossbreds as well.

Key words: admixture model, genomic breed composition, genomic prediction, regression models, SNP chip