遗传 ›› 2023, Vol. 45 ›› Issue (4): 324-340.doi: 10.16288/j.yczz.22-385
徐丹同(), 王祎菲, 蔡佳丽, 龚文滔, 潘向春, 田雨晗, 沈箐鹏, 李加琪, 袁晓龙()
收稿日期:
2023-01-18
修回日期:
2023-03-02
出版日期:
2023-04-20
发布日期:
2023-03-06
通讯作者:
袁晓龙
E-mail:xdt2020@163.com;yxl@scau.edu.cn
作者简介:
徐丹同,在读硕士研究生,专业方向:动物遗传育种与繁殖。E-mail: 基金资助:
Dantong Xu(), Yifei Wang, Jiali Cai, Wentao Gong, Xiangchun Pan, Yuhan Tian, Qingpeng Shen, Jiaqi Li, Xiaolong Yuan()
Received:
2023-01-18
Revised:
2023-03-02
Online:
2023-04-20
Published:
2023-03-06
Contact:
Yuan Xiaolong
E-mail:xdt2020@163.com;yxl@scau.edu.cn
Supported by:
摘要:
DNA甲基化异常可能导致拷贝数变异(copy number variants,CNVs)的发生,而CNVs的发生又可能改变DNA甲基化水平。全基因组亚硫酸氢盐测序(whole genome bisulfite sequencing,WGBS)技术能够获得DNA水平的测序数据,具有挖掘CNVs的潜力和优势,但利用WGBS数据挖掘CNVs的效果尚不清楚。本研究选取了5款检测CNVs不同策略的软件(BreakDancer、cn.mops、CNVnator、DELLY、Pindel),基于人类的真实(2.62 billion reads)和模拟(12.35 billion reads)测序数据,进行150次CNVs检测,评估CNVs检出数量、精确率、召回率、相对检出能力、内存占用和运行时间等指标,旨在讨论利用WGBS数据检测CNVs的最佳方案。基于真实WGBS数据,Pindel检出缺失型和重复型CNVs的数量最多,CNVnator对缺失型CNVs的检测精确率最高,cn.mops对重复型CNVs的检测精确率最高,Pindel对缺失型CNVs的召回率最高,cn.mops对重复型CNVs的召回率最高。基于模拟WGBS数据,BreakDancer检出缺失型CNVs数量最多,cn.mops检出重复型CNVs数量最多,CNVnator对缺失型和重复型CNVs的检测精确率和召回率均为最高。与全基因组测序数据相比,CNVnator在真实和模拟WGBS数据中检出CNVs的能力与之相当。此外,DELLY和BreakDancer的内存占用峰值和CPU运行时间最小,CNVnator的内存占用峰值和CPU运行时间最大。结果表明,利用WGBS数据检测CNVs具有可行性,使用CNVnator和cn.mops在WGBS数据上检测CNVs的准确率较高,这些工作为利用WGBS数据深入研究CNVs和DNA甲基化之间的相互关系提供一定的参考和帮助。
徐丹同, 王祎菲, 蔡佳丽, 龚文滔, 潘向春, 田雨晗, 沈箐鹏, 李加琪, 袁晓龙. 利用人类全基因组亚硫酸氢盐测序数据检测CNVs的研究[J]. 遗传, 2023, 45(4): 324-340.
Dantong Xu, Yifei Wang, Jiali Cai, Wentao Gong, Xiangchun Pan, Yuhan Tian, Qingpeng Shen, Jiaqi Li, Xiaolong Yuan. Study on detection of CNVs using human whole genome bisulfite sequencing data[J]. Hereditas(Beijing), 2023, 45(4): 324-340.
表1
真实数据的来源及编号"
BAM文件编号 | 数据类型 | 测序数据来源(来自NCBI) | 样本个体名称 |
---|---|---|---|
1 | WGS | SRR622457 | NA12878 |
2 | WGS | ERR3239334 | NA12878 |
3 | WGS | ERR194147 | NA12878 |
4 | WGBS | SRR10532133、SRR20318446、SRR20318448、SRR20318450 | NA12878 |
5 | WGBS | SRR10532131、SRR10532135、SRR6006942、SRR6006945 | NA12878 |
6 | WGBS | SRR10532128、SRR6006943、SRR6006944、SRR6006947 | NA12878 |
表2
5款软件在真实数据中检出CNVs数量的描述性统计"
软件 | 数据类型 | CNVs类型 | 检出数量均值 | 标准差 | 最小值 | 最大值 |
---|---|---|---|---|---|---|
BreakDancer | WGBS | DEL | 23,049.67 | 19,583.18 | 723.00 | 37,318.00 |
WGBS | DUP | 0 | 0 | 0 | 0 | |
WGS | DEL | 4,335.67 | 5,361.69 | 992.00 | 10,520.00 | |
WGS | DUP | 0 | 0 | 0 | 0 | |
cn.mops | WGBS | DEL | 424.67 | 19.86 | 409.00 | 447.00 |
WGBS | DUP | 392.00 | 12.77 | 378.00 | 403.00 | |
WGS | DEL | 201.67 | 37.07 | 160.00 | 231.00 | |
WGS | DUP | 157.00 | 18.33 | 137.00 | 173.00 | |
CNVnator | WGBS | DEL | 3,066.00 | 1,506.88 | 1,954.00 | 4,781.00 |
WGBS | DUP | 985.33 | 291.57 | 815.00 | 1,322.00 | |
WGS | DEL | 2,066.33 | 1,124.41 | 1,324.00 | 3,360.00 | |
WGS | DUP | 2,998.67 | 1,861.44 | 963.00 | 4,614.00 | |
DELLY | WGBS | DEL | 42,788.00 | 32,305.60 | 19,843.00 | 79,732.00 |
WGBS | DUP | 1,215.33 | 353.48 | 873.00 | 1,579.00 | |
WGS | DEL | 3,021.67 | 1,719.65 | 1,036.00 | 4,022.00 | |
WGS | DUP | 461.00 | 384.04 | 182.00 | 899.00 | |
Pindel | WGBS | DEL | 458,175.33 | 234,456.02 | 291,944.00 | 726,345.00 |
WGBS | DUP | 21,022.67 | 13,394.01 | 12,997.00 | 36,485.00 | |
WGS | DEL | 577,734.00 | 201,609.58 | 414,822.00 | 803,208.00 | |
WGS | DUP | 8,879.33 | 4,080.03 | 4,790.00 | 12,950.00 |
[1] |
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet, 2015, 16(3): 172-183.
doi: 10.1038/nrg3871 pmid: 25645873 |
[2] |
Hollox EJ, Zuccherato LW, Tucci S. Genome structural variation in human evolution. Trends Genet, 2022, 38(1): 45-58.
doi: 10.1016/j.tig.2021.06.015 |
[3] |
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science, 2007, 315(5813): 848-853.
doi: 10.1126/science.1136678 pmid: 17289997 |
[4] |
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MHY, Konkel MK, Malhotra A, Stütz AM, Shi XH, Casale FP, Chen JM, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJP, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HYK, Mu XJ, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong ZC, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu FL, Zhang CS, Zhang J, Zheng-Bradley X, Zhou WD, Zichner T, Sebat J, Batzer MA, McCarroll SA; 1000 Genomes Project Consortium; Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO. An integrated map of structural variation in 2,504 human genomes. Nature, 2015, 526(7571): 75-81.
doi: 10.1038/nature15394 |
[5] |
Monks S, Niarchou M, Davies AR, Walters JTR, Williams N, Owen MJ, van den Bree MBM, Murphy KC. Further evidence for high rates of schizophrenia in 22q11.2 deletion syndrome. Schizophr Res, 2014, 153(1-3): 231-236.
doi: 10.1016/j.schres.2014.01.020 pmid: 24534796 |
[6] |
Rose AM, Krishan A, Chakarova CF, Moya L, Chambers SK, Hollands M, Illingworth JC, Williams SMG, McCabe HE, Shah AZ, Palmer CNA, Chakravarti A, Berg JN, Batra J, Bhattacharya SS. MSR1 repeats modulate gene expression and affect risk of breast and prostate cancer. Ann Oncol, 2018, 29(5): 1292-1303.
doi: S0923-7534(19)34547-8 pmid: 29509840 |
[7] |
Liu B, Yang L, Huang BF, Cheng M, Wang H, Li YY, Huang DS, Zheng J, Li QC, Zhang X, Ji WD, Zhou YF, Lu JC. A functional copy-number variation in MAPKAPK2 predicts risk and prognosis of lung cancer. Am J Hum Genet, 2012, 91(2): 384-390.
doi: 10.1016/j.ajhg.2012.07.003 pmid: 22883146 |
[8] |
Wheeler E, Huang N, Bochukova EG, Keogh JM, Lindsay S, Garg S, Henning E, Blackburn H, Loos RJF, Wareham NJ, O'Rahilly S, Hurles ME, Barroso I, Farooqi IS. Genome-wide SNP and CNV analysis identifies common and low-frequency variants associated with severe early- onset obesity. Nat Genet, 2013, 45(5): 513-517.
doi: 10.1038/ng.2607 |
[9] |
Shi XH, Radhakrishnan S, Wen J, Chen JY, Chen JJ, Lam BA, Mills RE, Stranger BE, Lee C, Setlur SR. Association of CNVs with methylation variation. NPJ Genom Med, 2020, 5: 41.
doi: 10.1038/s41525-020-00145-w pmid: 34556651 |
[10] |
Sun W, Bunn P, Jin C, Little P, Zhabotynsky V, Perou CM, Hayes DN, Chen MJ, Lin DY. The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res, 2018, 46(6): 3009-3018.
doi: 10.1093/nar/gky131 pmid: 29529299 |
[11] |
Li J, Harris RA, Cheung SW, Coarfa C, Jeong M, Goodell MA, White LD, Patel A, Kang SH, Shaw C, Chinault AC, Gambin T, Gambin A, Lupski JR, Milosavljevic A. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome. PLoS Genet, 2012, 8(5): e1002692.
doi: 10.1371/journal.pgen.1002692 |
[12] |
Schenkel LC, Aref-Eshghi E, Rooney K, Kerkhof J, Levy MA, McConkey H, Rogers RC, Phelan K, Sarasua SM, Jain L, Pauly R, Boccuto L, DuPont B, Cappuccio G, Brunetti-Pierri N, Schwartz CE, Sadikovic B. DNA methylation epi-signature is associated with two molecularly and phenotypically distinct clinical subtypes of Phelan-McDermid syndrome. Clin Epigenetics, 2021, 13(1): 2.
doi: 10.1186/s13148-020-00990-7 pmid: 33407854 |
[13] |
Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci USA, 1992, 89(5): 1827-1831.
doi: 10.1073/pnas.89.5.1827 pmid: 1542678 |
[14] |
Cho S, Kim HS, Zeiger MA, Umbricht CB, Cope LM. Measuring DNA copy number variation using high-density methylation microarrays. J Comput Biol, 2019, 26(4): 295-304.
doi: 10.1089/cmb.2018.0143 pmid: 30789293 |
[15] |
Teng HJ, Xue MY, Liang JL, Wang XX, Wang L, Wei WQ, Li C, Zhang Z, Li QL, Ran X, Shi XH, Cai WS, Wang WH, Gao HJ, Sun ZS. Inter-and intratumor DNA methylation heterogeneity associated with lymph node metastasis and prognosis of esophageal squamous cell carcinoma. Theranostics, 2020, 10(7): 3035-3048.
doi: 10.7150/thno.42559 |
[16] |
Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS, McGrath SD, Wendl MC, Zhang QY, Locke DP, Shi XQ, Fulton RS, Ley TJ, Wilson RK, Ding L, Mardis ER. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat Methods, 2009, 6(9): 677-681.
doi: 10.1038/nmeth.1363 pmid: 19668202 |
[17] |
Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, Hochreiter S. cn. MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res, 2012, 40(9): e69.
doi: 10.1093/nar/gks003 |
[18] |
Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res, 2011, 21(6): 974-984.
doi: 10.1101/gr.114876.110 pmid: 21324876 |
[19] |
Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 2012, 28(18): i333-i339.
doi: 10.1093/bioinformatics/bts378 |
[20] |
Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 2009, 25(21): 2865-2871.
doi: 10.1093/bioinformatics/btp394 pmid: 19561018 |
[21] |
1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature, 2012, 491(7422): 56-65.
doi: 10.1038/nature11632 |
[22] | 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. A global reference for human genetic variation. Nature, 2015, 526(7571): 68-74. |
[23] | MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. The database of genomic variants: a curated collection of structural variation in the human genome. Nucleic Acids Res, 2014, 42(Database issue): D986-D992. |
[24] |
Kosugi S, Momozawa Y, Liu XX, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol, 2019, 20(1): 117.
doi: 10.1186/s13059-019-1720-5 pmid: 31159850 |
[25] |
Gabrielaite M, Torp MH, Rasmussen MS, Andreu-Sánchez S, Vieira FG, Pedersen CB, Kinalis S, Madsen MB, Kodama M, Demircan GS, Simonyan A, Yde CW, Olsen LR, Marvig RL, Østrup O, Rossing M, Nielsen FC, Winther O, Bagger FO. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers (Basel), 2021, 13(24): 6283.
doi: 10.3390/cancers13246283 |
[26] |
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423.
doi: 10.1093/bioinformatics/btp163 pmid: 19304878 |
[27] |
Xi YX, Li W. BSMAP: whole genome bisulfite sequence mapping program. BMC Bioinformatics, 2009, 10: 232.
doi: 10.1186/1471-2105-10-232 pmid: 19635165 |
[28] |
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience, 2021, 10(2): giab008.
doi: 10.1093/gigascience/giab008 |
[29] |
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010, 26(6): 841-842.
doi: 10.1093/bioinformatics/btq033 pmid: 20110278 |
[30] |
Nunn A, Otto C, Stadler PF, Langenberger D. Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis. Brief Bioinform, 2021, 22(5): bbab021.
doi: 10.1093/bib/bbab021 |
[31] |
Chatterjee A, Rodger EJ, Morison IM, Eccles MR, Stockwell PA. Tools and strategies for analysis of genome-wide and gene-specific DNA methylation patterns. Methods Mol Biol, 2017, 1537: 249-277.
pmid: 27924599 |
[32] |
Ji LX, Sasaki T, Sun XX, Ma P, Lewis ZA, Schmitz RJ. Methylated DNA is over-represented in whole-genome bisulfite sequencing data. Front Genet, 2014, 5: 341.
doi: 10.3389/fgene.2014.00341 pmid: 25374580 |
[33] | Gong WT, Pan XC, Xu DT, Ji GY, Wang YF, Tian YH, Cai JL, Li JQ, Zhang Z, Yuan XL. Benchmarking DNA methylation analysis of 14 alignment algorithms for whole genome bisulfite sequencing in mammals. Comput Struct Biotechnol J, 2022, 20: 4704-4716. |
[34] |
Haraksingh RR, Abyzov A, Urban AE. Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans. BMC Genomics, 2017, 18(1): 321.
doi: 10.1186/s12864-017-3658-x pmid: 28438122 |
[35] |
Pirooznia M, Goes FS, Zandi PP. Whole-genome CNV analysis: advances in computational approaches. Front Genet, 2015, 6: 138.
doi: 10.3389/fgene.2015.00138 pmid: 25918519 |
[36] |
Trost B, Walker S, Wang ZZ, Thiruvahindrapuram B, MacDonald JR, Sung WWL, Pereira SL, Whitney J, Chan AJS, Pellecchia G, Reuter MS, Lok S, Yuen RKC, Marshall CR, Merico D, Scherer SW. A comprehensive workflow for read depth-based identification of copy- number variation from whole-genome sequence data. Am J Hum Genet. 2018, 102(1): 142-155.
doi: 10.1016/j.ajhg.2017.12.007 |
[37] |
Malnic B, Godfrey PA, Buck LB. The human olfactory receptor gene family. Proc Natl Acad Sci USA, 2004, 101(8): 2584-2589.
doi: 10.1073/pnas.0307882100 pmid: 14983052 |
[38] |
Gaudet P, Livstone MS, Lewis SE, Thomas PD. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform, 2011, 12(5): 449-462.
doi: 10.1093/bib/bbr042 pmid: 21873635 |
[39] |
Matsunaga T, Endo S, Maeda S, Ishikura S, Tajima K, Tanaka N, Nakamura KT, Imamura Y, Hara A. Characterization of human DHRS4: an inducible short-chain dehydrogenase/reductase enzyme with 3beta-hydroxysteroid dehydrogenase activity. Arch Biochem Biophys, 2008, 477(2): 339-347.
doi: 10.1016/j.abb.2008.06.002 pmid: 18571493 |
[40] |
Su ZJ, Zhang QX, Liu GF, Song XH, Li Q, Wang RJ, Chen HB, Xu XY, Sui XX, Huang DY. Bioinformatic analysis of the human DHRS4 gene cluster and a proposed mechanism for its transcriptional regulation. BMC Mol Biol, 2010, 11: 43.
doi: 10.1186/1471-2199-11-43 |
[1] | 彭继苹,刘芳,谢华,陈晓丽. X染色体变异对男性精神发育迟滞致病性的研究进展[J]. 遗传, 2017, 39(6): 455-468. |
[2] | 刘静, 王亚楠, 孙亚奇, 王洪洋, 汪超, 彭中镇, 刘榜. 猪13号染色体上拷贝数变异区内基因信息发掘及遗传规律分析[J]. 遗传, 2014, 36(4): 354-359. |
[3] | 李佩尧,贺福初,周钢桥. 人microRNA相关的遗传变异与肿瘤[J]. 遗传, 2011, 33(8): 870-878. |
[4] | 杜仁骞,金力,张锋. 基因组拷贝数变异及其突变机理与人类疾病[J]. 遗传, 2011, 33(8): 857-869. |
[5] | 吴志俊,金玮. 拷贝数变异: 基因组多样性的新形式[J]. 遗传, 2009, 31(4): 339-347. |
[6] | 严卫丽. 复杂疾病全基因组关联研究进展—— 研究设计和遗传标记[J]. 遗传, 2008, 30(4): 400-406. |
[7] | 何阳花,俞英,张沅. 拷贝数变异与疾病的关系及其在动物抗病育种中的应用前景[J]. 遗传, 2008, 30(11): 1385-1391. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
www.chinagene.cn
备案号:京ICP备09063187号-4
总访问:,今日访问:,当前在线: