遗传

• 研究报告 •    

SNP密度对亲缘关系推断效能的影响及IBS算法的机器学习优化

戴律1,2,汤子琛3,贾镇1,2,江丽2,赵传桐1,2,赵志远1,2,赵雯婷2,李彩霞1,2   

  1. 1中国人民公安大学侦查学院,北京 100038 

    2公安部鉴定中心,北京 100038 

    3江苏师范大学生命科学学院,徐州 221116

  • 收稿日期:2025-12-08 修回日期:2026-02-05 发布日期:2026-02-12
  • 基金资助:
    中央级公益性科研院所基本科研业务费专项资金(编号:2024JB043,2024JB026),国家自然科学基金面上项目(编号:82171870)以及北京市科技新星计划(编号:20220484149)资助[Supported by the Fundamental Research Funds for Institute of Forensic Science (Nos. 2024JB043,2024JB026), the National Natural Science Foundation of China (No. 82171870), and the Beijing Nova Program of Science and Technology (No. 20220484149)]

SNP density impact on kinship inference and IBS-machine learning optimization

Lv Dai1,2, Zichen Tang3Zhen Jia1,2, Li Jiang2, Chuantong Zhao1,2, Zhiyuan Zhao1,2, Wenting Zhao2, Caixia Li1,2   

  1. 1. School of Criminal Investigation, People’s Public Security University of China, Beijing 100038, China

    2. Beijing Engineering Research Center of Crime Scene Evidence Examination, National Engineering Laboratory for Forensic Science, Institute of Forensic Science, Beijing 100038, China

    3. School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, China

  • Received:2025-12-08 Revised:2026-02-05 Online:2026-02-12

摘要:

近年来,法医遗传领域报道了多个包含不同数量单核苷酸多态性(single nucleotide polymorphism,SNP)的panel用于亲缘关系推断,但SNP位点数量对推断效能的影响及机器学习算法的应用缺乏系统探索为此,本研究评估了SNP位点数量对亲缘关系推断效能的影响机器学习方法对状态一致性(identity-by-stateIBS算法的优化效果。首先,构建位点数量在15,476~20,838范围内的多个SNP panel基于模拟家系评估似然比法和IBS算法在不同位点数量下的亲缘关系推断效能。在筛选出最优SNP panel利用真实家系进行验证并进一步将IBS算法与机器学习方法结合以提升推断效能。结果显示,似然比法在级和级亲缘关系推断中灵敏度与位点数量呈显著正相关。IBS算法四至七级亲缘关系推断的灵敏度虽然与位点数量呈显著正相关,但实际提升幅度有限(仅提升0.5%~2.2%)。基于上述结果,本研究确定了包含20,838SNP位点的最优panel21K panel)。21K panel基于似然比法可准确推断级以内亲缘关系(六级亲缘关系推断灵敏度为93.65%);基于IBS算法可准确推断级以内亲缘关系(三级亲缘关系推断灵敏度为86.79%)IBS算法结合机器学习后,级亲缘关系推断灵敏度69.10%提升至87.66%级和级亲缘关系推断灵敏度分别从38.03%21.41%提升至48.75%37.80%

关键词:

亲缘关系推断, 似然比法, IBS算法, 机器学习

Abstract:

In recent years, multiple panels containing varying numbers of single nucleotide polymorphisms (SNPs) have been reported in forensic genetics for kinship inference. However, systematic exploration of the impact of SNP number on inference performance and the application of machine learning algorithms remains lacking. Therefore, we evaluated the impact of SNP number on kinship inference performance and the optimization effects of machine learning methods on the identity-by-state (IBS) algorithm. We constructed multiple SNP panels with SNP numbers ranging from 15,476 to 20,838, and evaluated the performance of the likelihood ratio (LR) method and the IBS algorithm for kinship inference under different SNP numbers based on simulated pedigrees. After selecting the optimal SNP panel, we validated it using real pedigrees and further combined the IBS algorithm with machine learning methods to enhance inference performance. Our results showed that for the LR method, the sensitivity in inferring sixth and seventh degree kinships exhibited a significant positive correlation with SNP number. For the IBS algorithm, although the sensitivity in inferring fourth to seventh degree kinships showed a significant positive correlation with SNP number, the actual improvement was limited (only 0.5%~2.2% increase). Based on these results, we determined the optimal panel containing 20,838 SNPs (21K panel). The 21K panel based on the LR method could accurately infer kinships within sixth degree (with a sensitivity of 93.65% for sixth degree kinship inference); the 21K panel based on the IBS algorithm could accurately infer kinships within third degree (with a sensitivity of 86.79% for third degree kinship inference). After combining the IBS algorithm with machine learning, the sensitivity for fourth degree kinship inference improved from 69.10% to 87.66%, the sensitivities for fifth and sixth degree kinships improved from 38.03% and 21.41% to 48.75% and 37.80%, respectively.

Key words:

 , kinship inference, likelihood ratio method,  , IBS method,  , machine learning