遗传 ›› 2026, Vol. 48 ›› Issue (6): 570-588.doi: 10.16288/j.yczz.25-287

• 研究报告 • 上一篇    下一篇

SNP密度对亲缘关系推断效能的影响及IBS算法的机器学习优化

戴律1,2(), 汤子琛3, 贾镇1,2, 江丽2, 赵传桐1,2, 赵志远1,2, 赵雯婷2(), 李彩霞1,2()   

  1. 1 中国人民公安大学侦查学院北京 100038
    2 公安部鉴定中心北京 100038
    3 江苏师范大学生命科学学院徐州 221116
  • 收稿日期:2025-12-08 修回日期:2026-02-05 出版日期:2026-06-20 发布日期:2026-02-12
  • 通讯作者: 赵雯婷,博士,副主任法医师,研究方向:法医遗传学。E-mail: wtzhao@sibs.ac.cn;
    李彩霞,博士,主任法医师,研究方向:法医遗传学。E-mail: licaixia@tsinghua.org.cn
  • 作者简介:戴律,硕士研究生,专业方向:刑事科学技术。E-mail: 1130039162@qq.com
  • 基金资助:
    中央级公益性科研院所基本科研业务费专项资金(2024JB043);中央级公益性科研院所基本科研业务费专项资金(2024JB026);国家自然科学基金项目(82171870);北京市科技新星计划(20220484149)

SNP density impact on kinship inference and IBS-machine learning optimization

Lv Dai1,2(), Zichen Tang3, Zhen Jia1,2, Li Jiang2, Chuantong Zhao1,2, Zhiyuan Zhao1,2, Wenting Zhao2(), Caixia Li1,2()   

  1. 1 School of Criminal Investigation, People’s Public Security University of China, Beijing 100038, China
    2 Beijing Engineering Research Center of Crime Scene Evidence Examination, National Engineering Laboratory for Forensic Science, Institute of Forensic Science, Beijing 100038, China
    3 School of Life Sciences, Jiangsu Normal University, Xuzhou 221116, China
  • Received:2025-12-08 Revised:2026-02-05 Published:2026-06-20 Online:2026-02-12
  • Supported by:
    Fundamental Research Funds for Institute of Forensic Science(2024JB043);Fundamental Research Funds for Institute of Forensic Science(2024JB026);National Natural Science Foundation of China(82171870);Beijing Nova Program of Science and Technology(20220484149)

摘要:

近年来,法医遗传领域报道了多个包含不同数量单核苷酸多态性(single nucleotide polymorphism,SNP)组合(panel)用于亲缘关系推断,但SNP位点数量对推断效能的影响及机器学习算法的应用缺乏系统探索。为此,本研究评估了SNP位点数量对亲缘关系推断效能的影响及机器学习方法对状态一致性(identity-by-state,IBS)算法的优化效果。首先,构建位点数量在15,476~20,838范围内的多个SNP panel,基于模拟家系评估似然比法和IBS算法在不同位点数量下的亲缘关系推断效能。在筛选出最优SNP panel后,利用真实家系进行验证,并进一步将IBS算法与机器学习方法结合以提升推断效能。结果显示,似然比法在六级和七级亲缘关系推断中的灵敏度与位点数量呈显著正相关。IBS算法四至七级亲缘关系推断的灵敏度虽然与位点数量呈显著正相关,但实际提升幅度有限(仅提升0.5%~2.2%)。基于上述结果,本研究确定了包含20,838个SNP位点的最优panel(21K panel)。21K panel基于似然比法可准确推断六级以内亲缘关系(六级亲缘关系推断灵敏度为93.65%);基于IBS算法可准确推断三级以内亲缘关系(三级亲缘关系推断灵敏度为86.79%)。IBS算法结合机器学习后,四级亲缘关系推断灵敏度从69.10%提升至87.66%,五级和六级亲缘关系推断灵敏度分别从38.03%和21.41%提升至48.75%和37.80%。

关键词: 亲缘关系推断, 似然比法, IBS算法, 机器学习

Abstract:

In recent years, multiple panels containing varying numbers of single nucleotide polymorphisms (SNPs) have been reported in forensic genetics for kinship inference. However, systematic exploration of the impact of SNP number on inference performance and the application of machine learning algorithms remains lacking. Therefore, we evaluated the impact of SNP number on kinship inference performance and the optimization effects of machine learning methods on the identity-by-state (IBS) algorithm. We constructed multiple SNP panels with SNP numbers ranging from 15,476 to 20,838, and evaluated the performance of the likelihood ratio (LR) method and the IBS algorithm for kinship inference under different SNP numbers based on simulated pedigrees. After selecting the optimal SNP panel, we validated it using real pedigrees and further combined the IBS algorithm with machine learning methods to enhance inference performance. Our results showed that for the LR method, the sensitivity in inferring sixth and seventh degree kinships exhibited a significant positive correlation with SNP number. For the IBS algorithm, although the sensitivity in inferring fourth to seventh degree kinships showed a significant positive correlation with SNP number, the actual improvement was limited (only 0.5%~2.2% increase). Based on these results, we determined the optimal panel containing 20,838 SNPs (21K panel). The 21K panel based on the LR method could accurately infer kinships within sixth degree (with a sensitivity of 93.65% for sixth degree kinship inference), and the 21K panel based on the IBS algorithm could accurately infer kinships within third degree (with a sensitivity of 86.79% for third degree kinship inference). After combining the IBS algorithm with machine learning, the sensitivity for fourth degree kinship inference improved from 69.10% to 87.66%, the sensitivities for fifth and sixth degree kinships improved from 38.03% and 21.41% to 48.75% and 37.80%, respectively.

Key words: kinship inference, likelihood ratio method, IBS method, machine learning