遗传

• 研究报告 •    

基因型填充提高湿地松SNP密度与遗传分析准确性

蒋雨萱1,边黎明1,周师亮1,Charles Chen4,Yousry A. El-Kassaby2,陈志强1,Harry X. Wu1,3   

  1. 1.南京林业大学林草学院,林木遗传育种全国重点实验室,南方现代林业协同创新中心, 南京210037

    2.加拿大不列颠哥伦比亚大学林学院 温哥华V6T1Z4

    3.瑞典农业科学大学林学院,于默奥植物科学中心,于默奥 90183

    4.美国俄克拉荷马州立大学,生物安全与微生物取证研究所,生物化学与分子生物学系,斯蒂尔沃特 74075
  • 发布日期:2026-03-11
  • 基金资助:
    中华人民共和国科技部科技创新2030农业生物育种国家科技重大专项(编号:2023ZD0405805)资助

Genotype imputation improves SNP density and genetic analysis accuracy in slash pine

Yuxuan Jiang1, Liming Bian1, Shiliang Zhou1, Charles Chen4, Yousry A. El-Kassaby2, Zhiqiang Chen1, Harry X. Wu1,3   

  1. 1.College of Forestry and Grassland, State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China

    2.Faculty of Forestry, University of British Columbia, Vancouver, BC V6T1Z4, Canada

    3.College of Forestry, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, Umeå 90183, Sweden

    4.Institute for Biosecurity & Microbial Forensics, Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater OK 74075, USA
  • Online:2026-03-11
  • Supported by:
    Supported by the Scientific and Technological Innovation 2030 “Major Project in Agricultural Biotechnology Breeding” of Ministry of Science and Technology of the People's Republic of China (No. 2023ZD0405805)

摘要: 湿地松(Pinus elliottii)基因组庞大且重复序列占比高,现有低密度单核苷酸多态性(single nucleotide polymorphism,SNP)芯片在标记覆盖与连锁信息刻画方面存在局限。为提高群体遗传分析的标记密度及基因组亲缘关系矩阵(genomic relationship matrix,GRM)估计精度,本研究以50株母本约10×全基因组重测序数据构建参考面板,并对715株半同胞子代的51K芯片分型数据进行全基因组填充。芯片原有位点采用遮盖实验评估填充准确性,参考面板扩增且芯片未覆盖的新增位点采用子代重测序外部验证评估一致性并筛选,遮盖验证一致性稳定在约95.5%,参考面板扩增位点经阈值筛选后保留高可信位点,构建了覆盖715个体的高密度基因型矩阵,包含120,650,180个SNP。局部连锁不平衡(linkage disequilibrium,LD)热图对比显示,增密后LD信号更连续且块结构更清晰;以第4号染色体上10.22~10.33 Mb区段为例,高LD位点对比例由14.5%增至27.6%。基于增密数据构建的基因组关系矩阵与芯片矩阵在非对角线元素上保持高度一致,Pearson相关系数约为0.984;距离分层分析进一步显示,距芯片标记500 kb以内的填充位点构建的矩阵一致性更高,而更远距离窗口的一致性逐步下降,提示远距离填充位点带来的改进相对有限。综上,本研究建立的参考面板驱动填充、验证与整合流程可为湿地松及相关针叶树的全基因组关联分析与基因组选择研究提供高密度基因型数据基础。

关键词: 湿地松, SNP芯片, 基因型填充, 参考面板, 高密度基因型矩阵, 基因组亲缘关系矩阵

Abstract: Slash pine (Pinus elliottii) possesses an exceptionally large, repeat-rich genome, and existing low-density SNP arrays provide limited marker coverage and resolution of linkage patterns. To increase marker density for population-based genetic analyses and improve the accuracy of genomic relationship matrix (GRM) estimation, we constructed a reference panel from ~10× whole-genome resequencing of 50 maternal parents and performed genome-wide imputation for 51K SNP-array genotypes of 715 half-sib progeny. Imputation accuracy at array loci was quantified using chromosome-local masking experiments, whereas reference-panel-expanded loci not represented on the array were evaluated for concordance and filtered via external validation using progeny resequencing data. Masking-based concordance remained stable at 95.5%, and after threshold-based filtering of expanded loci, we generated a high-density genotype matrix for all 715 individuals comprising 120,650,180 SNPs. Comparative local linkage disequilibrium (LD) heatmaps indicated more continuous LD signals and clearer block structures after densification; for the Chr4 10.22-10.33 Mb interval, the proportion of high-LD SNP pairs increased from 14.5% to 27.6%. The GRM derived from the densified dataset was highly consistent with the array-based GRM in off-diagonal elements (Pearson’s r≈0.984). Distance-stratified analyses further showed higher concordance for GRMs constructed from imputed loci within 500 kb of array markers, with concordance decreasing progressively in more distant windows, suggesting limited incremental benefit from long-range imputed loci. Collectively, the reference panel-driven imputation, validation, and integration framework established here provides a high-density genotypic resource for genome-wide association studies and genomic selection in slash pine and other conifer species.

Key words: Pinus elliottii, SNP array, genotype imputation, reference panel, high-density genotype matrix, genomic relationship matrix