遗传

• 研究报告 •    

基于全基因组数据的AI-SNPs筛选及次大陆群体遗传结构差异研究【专刊】

王浩宇,胡渝涵,曹悦岩,朱强,黄雨果,李茜,张霁   

  1. 四川大学华西基础医学与法医学院
  • 收稿日期:2021-05-26 修回日期:2021-07-29 出版日期:2021-08-04 发布日期:2021-08-04
  • 通讯作者: 张霁
  • 基金资助:
    国家自然科学基金;国家自然科学基金

AI-SNPs screening based on the whole genome data and research on genetic structure differences of subcontinent populations

#br#   

  • Received:2021-05-26 Revised:2021-07-29 Online:2021-08-04 Published:2021-08-04
  • Contact: Ji Zhang

摘要: 在涉及多群体样本的医学研究中,群体遗传结构差异是不容忽视的影响因素之一。利用族源信息单核苷酸多态性遗传标记(Ancestry-informative single nucleotide polymorphism,AI-SNP),通过分析群体遗传成分、推断个体遗传背景并对群体样本进行预筛选,可以有效降低群体遗传结构差异对医学研究影响。已发表的研究多为解析大陆间、次大陆间的群体遗传结构差异。本研究拟基于千人基因组计划(GRCh37.p13)中东亚五群体(日本东京群体、北京汉族、南方汉族、西双版纳傣族、越南京族)的数据,以FST值为标准筛选AI-SNP并分析次大陆内群体遗传结构差异。结果表明,东亚群体可分为三簇:JPT、CHB和CHS、CDX和KHV。利用AI-SNP可成功解析个体的遗传背景,而群体代表性遗传成分占比超过80%的个体具有良好的群体代表性。本研究表明,基于FST值筛选一组AI-SNP用于核验样本遗传背景、筛选群体代表性样本的方法在降低次大陆群体遗传结构差异对群体相关医学研究的影响中具有实际应用价值。

关键词: 族源信息遗传标记, 单核苷酸多态性, 东亚群体, 遗传结构差异

Abstract: The genetic structure differences in population is one of the key elements in multi-population involved medical research. A set of Ancestry-informative single nucleotide polymorphisms (AI-SNPs) can be utilized to analyze genetic component of population, infer ancestral origin of individuals and pre-filter samples to reduce the impact of population genetic structure differences on medical researchers. However, most of the published research were focus on revealing the differences between continent population or subcontinent population. In this paper, AI-SNPs were screened by calculating FST value in each pair of five East Asian population (Japanese in Tokyo, Han Chinese in Beijing, Southern Han Chinese, Chinese Dai in Xishuangbanna and Kinh in Ho Chi Minh City) in the 1000 Genomes Project phase 3 (GRCh37.p13) to analyze differences in subcontinent population. The results demonstrate that five East Asian populations were assigned to three clusters: JPT, CHB and CHS, CDX and KHV. A set of AI-SNPs can be used for analysis of individual genetic composition and selection of representative individuals. Individuals with over 80% population representative genetic components have good representativeness of population. This paper proved that the practical value of the method, which verifying ancestral composition and selecting representative samples by a panel of screened AI-SNPs by FST value, was performed in reducing the influence of genetic structure differences in subcontinent population on population-related medical research.

Key words: Ancestry-informative marker, Single nucleotide polymorphism (SNP), East Asian populations, Genetic structure differences