遗传 ›› 2010, Vol. 32 ›› Issue (9): 921-928.doi: 10.3724/SP.J.1005.2010.00921

• 研究报告 • 上一篇    下一篇

随机SNP在全基因组关联研究人群分层分析中的应用

曹宗富1, 2, 马传香1, 2, 王雷1, 2, 蔡斌1, 2   

  1. 1. 生物芯片北京国家工程研究中心, 北京 102206; 2. 博奥生物有限公司, 北京 102206
  • 收稿日期:2009-11-12 修回日期:2010-03-10 出版日期:2010-09-20 发布日期:2010-09-20
  • 通讯作者: 蔡斌 E-mail:bcai@capitalbio.com
  • 基金资助:

    国家高技术研究发展计划项目(863 计划)(编号:2009AA022708)资助

Analysis of Population Stratification Using Random SNPs in Genome-wide Association Studies

CAO Zong-Fu1, 2, MA Chuan-Xiang1, 2, WANG Lei1, 2, CAI Bin1, 2   

  1. 1. National Engineering Research Center for Beijing Biochip Technology, Beijing 102206, China; 2. CapitalBio Corporation, Beijing 102206, China
  • Received:2009-11-12 Revised:2010-03-10 Online:2010-09-20 Published:2010-09-20
  • Contact: CAI Bin E-mail:bcai@capitalbio.com

摘要: 在复杂疾病的全基因组关联研究中,人群分层现象会增加结果的假阳性率,因此考虑人群遗传结构、控制人群分层是很有必要的。而在人群分层研究中,使用随机选择的SNP的效果还有待进一步探讨。文章利用HapMap Phase2人群中无关个体的Affymetrix SNP 6.0芯片分型数据,在全基因组上随机均匀选择不同数量的SNP,同时利用f值和Fisher精确检验方法筛选祖先信息标记(Ancestry Informative Markers,AIMs)。然后利用HapMap Phase3中的无关个体的数据,以F-statistics和STRUCTURE分析两种方法评估所选出的不同SNP组合对人群的区分效果。研究发现,随机均匀分布于全基因组的SNP可用于识别人群内部存在的遗传结构。文章进一步提示,在全基因组关联研究中,当没有针对特定人群的AIMs时,可在全基因组上随机选择3000以上均匀分布的SNP来控制人群分层。

关键词: 全基因组关联研究, 人群分层, 祖先信息标记, 随机SNP, Affymetrix SNP 6.0芯片

Abstract: Since population genetic STRUCTURE can increase false-positive rate in genome-wide association studies (GWAS) for complex diseases, the effect of population stratification should be taken into account in GWAS. However, the effect of randomly selected SNPs in population stratification analysis is underdetermined. In this study, based on the genotype data generated on Genome-Wide Human SNP Array 6.0 from unrelated individuals of HapMap Phase2, we randomly selected SNPs that were evenly distributed across the whole-genome, and acquired Ancestry Informative Markers (AIMs) by the method of f value and allelic Fisher exact test. F-statistics and STRUCTURE analysis based on the select different sets of SNPs were used to evaluate the effect of distinguishing the populations from HapMap Phase3. We found that randomly selected SNPs that were evenly distributed across the whole-genome were able to be used to identify the population structure. This study further indicated that more than 3 000 randomly selected SNPs that were evenly distributed across the whole-genome were substituted for AIMs in population stratification analysis, when there were no available AIMs for spe-cific populations.

Key words: genome-wide association study, population stratification, ancestry informative markers, random SNP, Af-fymetrix SNP 6.0 array