遗传 ›› 2021, Vol. 43 ›› Issue (10): 938-948.doi: 10.16288/j.yczz.21-185

• 研究报告 • 上一篇    下一篇

基于全基因组数据的AI-SNPs筛选及大陆次级区域内群体遗传结构差异研究

王浩宇(), 胡渝涵(), 曹悦岩, 朱强, 黄雨果, 李茜, 张霁   

  1. 四川大学华西基础医学与法医学院,成都 610041
  • 收稿日期:2021-05-26 修回日期:2021-07-23 出版日期:2021-10-20 发布日期:2021-08-04
  • 作者简介:王浩宇,在读硕士研究生,专业方向:法医物证学。E-mail: wanghy0707@gmail.com;|胡渝涵,在读硕士研究生,专业方向:法医物证学。E-mail: huyuhan28@163.com; 王浩宇和胡渝涵并列第一作者。
  • 基金资助:
    国家自然科学基金项目资助编号(81571861);国家自然科学基金项目资助编号(81630054)

AI-SNPs screening based on the whole genome data and research on genetic structure differences of subcontinent populations

Haoyu Wang(), Yuhan Hu(), Yueyan Cao, Qiang Zhu, Yuguo Huang, Xi Li, Ji Zhang   

  1. West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610041, China
  • Received:2021-05-26 Revised:2021-07-23 Online:2021-10-20 Published:2021-08-04
  • Supported by:
    Supported by the National Natural Science Foundation of China Nos(81571861);Supported by the National Natural Science Foundation of China Nos(81630054)

摘要:

在涉及多群体样本的医学研究中,群体遗传结构差异是不容忽视的影响因素之一。利用族源信息单核苷酸多态性遗传标记(ancestry-informative single nucleotide polymorphism, AI-SNP),通过分析群体遗传成分、推断个体遗传背景并对群体样本进行预筛选,可以有效降低群体遗传结构差异对医学研究影响。鉴于已发表的研究多为解析大陆间、大陆次级区域间的群体遗传结构差异,本研究拟基于千人基因组计划(GRCh37.p13)中东亚五群体:日本东京群体(Japanese in Tokyo, JPT)、北京汉族(Han Chinese in Beijing, CHB)、南方汉族(Southern Han Chinese, CHS)、西双版纳傣族(Chinese Dai in Xishuangbanna, CDX)、越南京族(Kinh in Ho Chi Minh City, KHV)的数据,以FST值为标准筛选AI-SNP并分析大陆次级区域内群体遗传结构差异。结果表明,研究涉及的东亚群体可分为三簇:JPT、CHB和CHS、CDX和KHV。利用AI-SNP可成功解析个体的遗传背景,而群体代表性遗传成分占比超过80%的个体具有良好的群体代表性。本研究表明,基于FST值筛选一组AI-SNP用于核验样本遗传背景、筛选群体代表性样本的方法在降低大陆次级区域内群体遗传结构差异对群体相关医学研究的影响中具有实际应用价值。

关键词: 族源信息遗传标记, 单核苷酸多态性, 东亚群体, 遗传结构差异

Abstract:

The genetic structure differences in population is one of the key elements in medical research involving multi-population samples. A set of ancestry-informative single nucleotide polymorphisms (AI-SNPs) can be utilized to analyze genetic component of a population, infer ancestral origin of individuals and pre-filter samples to reduce the impact of population genetic structure differences on medical research. However, most of the published studies were focused on revealing the differences between populations of continents or regions of a continent. In this paper, AI-SNPs were screened by calculating FST value in each pair of five East Asian populations: Japanese in Tokyo (JPT), Han Chinese in Beijing (CHB), Southern Han Chinese (CHS), Chinese Dai in Xishuangbanna (CDX) and Kinh in Ho Chi Minh City (KHV) in the 1000 Genomes Project phase 3 (GRCh37.p13) to analyze differences in subcontinent populations. The results demonstrate that the five East Asian populations in our study were assigned to three clusters: JPT, CHB and CHS, CDX and KHV. A set of AI-SNPs can be used for analysis of individual genetic composition and selection of representative individuals. Individuals with over 80% population representative genetic components have good representativeness of a population. This paper demonstrated the practical value of the method, which was performed to verify the ancestral composition and select representative samples with a panel of screened AI-SNPs by FST value, thereby reducing the influence of genetic structure differences in subcontinent populations on population-related medical research.

Key words: ancestry-informative marker, single nucleotide polymorphism (SNP), East Asian populations, genetic structure differences