遗传 ›› 2023, Vol. 45 ›› Issue (4): 324-340.doi: 10.16288/j.yczz.22-385

• 研究报告 • 上一篇    下一篇

利用人类全基因组亚硫酸氢盐测序数据检测CNVs的研究

徐丹同(), 王祎菲, 蔡佳丽, 龚文滔, 潘向春, 田雨晗, 沈箐鹏, 李加琪, 袁晓龙()   

  1. 华南农业大学动物科学学院/广东省农业动物基因组学与分子育种重点实验室/国家生猪种业工程技术研究中心, 广州 510642
  • 收稿日期:2023-01-18 修回日期:2023-03-02 出版日期:2023-04-20 发布日期:2023-03-06
  • 通讯作者: 袁晓龙 E-mail:xdt2020@163.com;yxl@scau.edu.cn
  • 作者简介:徐丹同,在读硕士研究生,专业方向:动物遗传育种与繁殖。E-mail: xdt2020@163.com
  • 基金资助:
    国家生猪产业技术体系(CARS-35);广东省科技专项资金(210713156902656)

Study on detection of CNVs using human whole genome bisulfite sequencing data

Dantong Xu(), Yifei Wang, Jiali Cai, Wentao Gong, Xiangchun Pan, Yuhan Tian, Qingpeng Shen, Jiaqi Li, Xiaolong Yuan()   

  1. College of Animal Science, South China Agricultural University/Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding/National Engineering Research Center for Breeding Swine Industry, Guangzhou 510642, China
  • Received:2023-01-18 Revised:2023-03-02 Online:2023-04-20 Published:2023-03-06
  • Contact: Yuan Xiaolong E-mail:xdt2020@163.com;yxl@scau.edu.cn
  • Supported by:
    Earmarked Fund for China Agriculture Research System(CARS-35);Earmarked Fund for Guangdong Provincial Science and Technology Project(210713156902656)

摘要:

DNA甲基化异常可能导致拷贝数变异(copy number variants,CNVs)的发生,而CNVs的发生又可能改变DNA甲基化水平。全基因组亚硫酸氢盐测序(whole genome bisulfite sequencing,WGBS)技术能够获得DNA水平的测序数据,具有挖掘CNVs的潜力和优势,但利用WGBS数据挖掘CNVs的效果尚不清楚。本研究选取了5款检测CNVs不同策略的软件(BreakDancer、cn.mops、CNVnator、DELLY、Pindel),基于人类的真实(2.62 billion reads)和模拟(12.35 billion reads)测序数据,进行150次CNVs检测,评估CNVs检出数量、精确率、召回率、相对检出能力、内存占用和运行时间等指标,旨在讨论利用WGBS数据检测CNVs的最佳方案。基于真实WGBS数据,Pindel检出缺失型和重复型CNVs的数量最多,CNVnator对缺失型CNVs的检测精确率最高,cn.mops对重复型CNVs的检测精确率最高,Pindel对缺失型CNVs的召回率最高,cn.mops对重复型CNVs的召回率最高。基于模拟WGBS数据,BreakDancer检出缺失型CNVs数量最多,cn.mops检出重复型CNVs数量最多,CNVnator对缺失型和重复型CNVs的检测精确率和召回率均为最高。与全基因组测序数据相比,CNVnator在真实和模拟WGBS数据中检出CNVs的能力与之相当。此外,DELLY和BreakDancer的内存占用峰值和CPU运行时间最小,CNVnator的内存占用峰值和CPU运行时间最大。结果表明,利用WGBS数据检测CNVs具有可行性,使用CNVnator和cn.mops在WGBS数据上检测CNVs的准确率较高,这些工作为利用WGBS数据深入研究CNVs和DNA甲基化之间的相互关系提供一定的参考和帮助。

关键词: 全基因组亚硫酸氢盐测序, 拷贝数变异, 软件评估

Abstract:

It has been reported that the aberrant DNA methylation may result in copy number variations (CNVs), and the CNVs may alter the levels of DNA methylation. Whole genome bisulfite sequencing (WGBS) is able to generate the sequencing data of DNAs, and shows the potential ability to detect CNVs. However, the evaluations and performances on the detections of CNVs using WGBS data is still unclear. In this study, five software with different strategies for CNV detections, e.g., BreakDancer, cn.mops, CNVnator, DELLY and Pindel, were selected to explore and benchmark the performances of CNV detections with WGBS data. Based on the real (2.62 billion reads) and simulated (12.35 billion reads) WGBS data of humans, we calculated the number, precision, recall, relative ability, memory usage, and running time of CNV detections by 150 times, and tried to figure out the optimal strategy for CNV detections with WGBS data. Based on the real WGBS data, Pindel detected the most deletions and duplications, CNVnator detected the deletions with the highest precision, cn.mops detected the duplications with the highest precision, Pindel detected the deletions with the highest recall, and cn.mops detected the duplications with the highest recall. Based on the simulated WGBS data, BreakDancer detected the most deletions, and cn.mops detected the most duplications. The CNVnator showed the highest precision and recall for both deletions and duplications. In real and simulated WGBS data, the ability of CNVnator to detect CNVs was likely to overtake that in the whole genome sequencing data. Additionally, DELLY and BreakDancer displayed the lowest peak of memory usage and the lest CPU runtime, while CNVnator expressed the highest peak of memory usage and the most CPU runtime. Taken together, CNVnator and cn.mops showed the excellent performances of CNV detections with WGBS data. These results suggested that it was feasible to detect CNVs using WGBS data, and provided the useful information to further investigate both CNVs and DNA methylation using WGBS data alone.

Key words: whole genome bisulfite sequencing, copy number variation, software evaluation