遗传 ›› 2014, Vol. 36 ›› Issue (7): 669-678.doi: 10.3724/SP.J.1005.2014.0669
王章群1, 解增言2, 蔡应繁3, 舒坤贤2, 黄飞飞2
收稿日期:
2013-10-28
出版日期:
2014-07-20
发布日期:
2014-06-23
通讯作者:
解增言, 讲师, 博士, 硕士生导师, 研究方向:生物信息学和分子进化。E-mail: zengyanxie@gmail.com 蔡应繁, 教授, 博士, 博士生导师, 研究方向:植物分子生物学和生物信息学。E-mail: cyf@henu.edu.cn
作者简介:
王章群, 硕士研究生, 专业方向:生物信息学。Tel: 023-62461048; E-mail: 515544361@qq.com
基金资助:
Zhangqun Wang1, Zengyan Xie2, Yingfan Cai3, Kunxian Shu2, Feifei Huang2
Received:
2013-10-28
Online:
2014-07-20
Published:
2014-06-23
摘要: 系统发育基因组学是利用全基因组数据构建系统发育树的新领域。全基因组数据能有效消除横向基因转移和类群间基因进化速率差异等因素对系统发育树的影响。根据所使用的全基因组数据的类型, 可以将系统发育基因组学方法分为以下5类:多基因联合建树方法, 基于基因含量的方法, 基于基因排列信息的方法, 基于序列短串含量特征信息的方法及基于代谢途径的方法。文章系统地总结了每一类方法的原理、速度、准确性、适用范围及在各个生物类群中的应用, 并对系统发育基因组学的前景及面临的挑战进行了概述。
王章群, 解增言, 蔡应繁, 舒坤贤, 黄飞飞. 系统发育基因组学研究进展[J]. 遗传, 2014, 36(7): 669-678.
Zhangqun Wang, Zengyan Xie, Yingfan Cai, Kunxian Shu, Feifei Huang. Advances in phylogenomics[J]. HEREDITAS(Beijing), 2014, 36(7): 669-678.
[1] SL, Giordano R, Colbert AM, Karr TL, Robertson HM. 16S rRNA phylogenetic analysis of the bacterial endosymbionts associated with cytoplasmic incompatibility in insects. Proc Natl Acad Sci USA , 1992, 89(7): 2699- 2702. [2] U, Pommerening-Röser A, Juretschko S, Schmid MC, Koops HP, Wagner M. Phylogeny of all recognized species of ammonia oxidizers based on comparative 16S rRNA and amoA sequence analysis: implications for molecular diversity surveys. Appl Environ Microb , 2000, 66(12): 5368-5382. [3] SB, Moberg KD, Maxson LR. Tetrapod phylogeny inferred from 18S and 28S ribosomal RNA sequences and a review of the evidence for amniote relationships. Mol Biol Evol , 1990, 7(6): 607-633. [4] 郭茂祖. 系统发生树构建技术综述. 电子学报, 2006, 34(11): 2047-2052. [5] WF, Logsdon JM Jr. Archaeal genomics: do archaea have a mixed heritage? Curr Biol , 1998, 8(6): R209- R211. [6] WF. Phylogenetic classification and the universal tree. Science , 1999, 284(5423): 2124-2128. [7] MA, Bork P. Measuring genome evolution. Proc Natl Acad Sci USA , 1998, 95(11): 5849-5856. [8] JH, Rosenberg NA. Discordance of species trees with their most likely gene trees. PLoS Genet , 2006, 2(5): e68. [9] S, Liu L, Edwards SV, Wu SY. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Natl Acad Sci USA , 2012, 109(37): 14942-14947. [10] B, Bork P, Huynen MA. Genome phylogeny based on gene content. Nat Genet , 1999, 21(1): 108-110. [11] BM, Wang LS, Warnow T, Wyman SK. New approaches for reconstructing phylogenies from gene order data. Bioinformatics , 2001, 17(Suppl.1): S165-S173. [12] C, Steel M. A supertree method for rooted trees. Discrete Appl Math , 2000, 105(1-3): 147-158. [13] BL, Qi J. Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J Bioinform Comput Biol , 2004, 2(1): 1-19. [14] 张亚平. 系统发育基因组学——重建生命之树的一条迷人途径. 遗传, 2006, 28(11): 1445-1450. [15] T, Gevers D, Van de Peer Y, Vandamme P, Swings J. Towards a prokaryotic genomic taxonomy. FEMS Micr o biol Rev , 2005, 29(2): 147-167. [16] 孙啸. 基于全基因组的系统发生分析. 生物技术, 2003, 13(6): 53-56. [17] M, Eisen JA. A simple, fast, and accurate method of phylogenomic inference. Genome Biol , 2008, 9(10): R151. [18] MA. Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol , 1992, 1(1): 53-58. [19] OR, Gittleman JL, Steel MA. The (super) tree of life: procedures, problems, and prospects. Annu Rev Ecol Syst , 2002, 33: 265-289. [20] BR. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon , 1992, 44(1): 3-10. [21] ZG, Ye ZQ, Yu L, Shi P. Phylogenomic reconstruction of lactic acid bacteria: an update. BMC Evol Biol , 2011, 11: 1. [22] PJ. The endosymbiotic origin, diversification and fate of plastids. Philos Trans R Soc Lond B Biol Sci , 2010, 365(1541): 729-748. [23] A, Gribaldo S. Large-scale phylogenomic analyses indicate a deep origin of primary plastids within cyanobacteria. Mol Biol Evol , 2011, 28(11): 3019-3032. [24] G, Derelle R, Paps J, Lang BF, Roger AJ, Shalchian-Tabrizi K, Ruiz-Trillo I. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single-copy protein domains. Mol Biol Evol , 2012, 29(2): 531-544. [25] F, Brinkmann H, Chourrout D, Philippe H. Tunicates and not cephalochordates are the closest living relatives of vertebrates. Nature , 2006, 439(7079): 965-968. [26] XX, Liang D, Feng YJ, Chen MY, Zhang P. A versatile and highly efficient toolkit including 102 nuclear markers for vertebrate phylogenomics, tested by resolving the higher level relationships of the caudata. Mol Biol Evol , 2013, 30(10): 2235-2248. [27] XX, Liang D, Wen JZ, Zhang P. Multiple genome alignments facilitate development of NPCL markers: a case study of tetrapod phylogeny focusing on the position of turtles. Mol Biol Evol , 2011, 28(12): 3237-3252. [28] Y, Cahais V, Galtier N, Delsuc F. Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (Archosauria). BMC Biol , 2012, 10(1): 65. [29] SJ, Kimball RT, Reddy S, Bowie RCK, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han KL, Harshman J. A phylogenomic study of birds reveals their evolutionary history. Science , 2008, 320(5884): 1763-1768. [30] JE, Faircloth BC, Crawford NG, Gowaty PA, Brumfield RT, Glenn TC. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res , 2012, 22(4): 746-754. [31] EK, Cibrian-Jaramillo A, Kolokotronis SO, Katari MS, Stamatakis A, Ott M, Chiu JC, Little DP, Stevenson DW, McCombie WR, Martienssen RA, Coruzzi G, Desalle R. A functional phylogenomic view of the seed plants. PLoS Genet , 2011, 7(12): e1002411. [32] 孙悦娜, 王日昕, 汤达, 赵盛龙, 徐田军. 虾虎鱼类线粒体全基因组序列结构特征分析及系统发育关系探讨. 遗传, 2013, 35(12): 1391-1402. [33] 程起群. 鳀科鱼类线粒体全基因组序列结构特征及系统发育信息分析. 海洋渔业, 2012, 34(1): 7-14. [34] 张洪海, 沙未来, 张承德, 陈玉才. 赤狐线粒体全基因组及系统发育分析. 动物学研究, 2010, 31(2): 122-130. [35] B, Huynen MA, Dutilh BE. Genome trees and the nature of genome evolution. Annu Rev Microbiol , 2005, 59: 191-209. [36] SR, Rosenberg MS, Kumar S. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J Exp Zool B Mol Dev Evol , 2005, 304(1): 64-74. [37] LS, Degnan JH. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Sy s tematic Biol , 2007, 56(1): 17-24. [38] W, Cao ZB, Wang Y, Sun Y, Blanzieri E, Liang YC. Prokaryotic phylogenies inferred from whole-genome sequence and annotation data. Biomed Res Int , 2013, 2013: 409062. [39] A, Volkovich Z. Whole-genome prokaryotic clustering based on gene lengths. Discrete Appl Math , 2009, 157(10): 2370-2377. [40] DH, Steel M. Phylogenetic trees based on gene content. Bioinformatics , 2004, 20(13): 2044-2049. [41] X, Zhang HM. Genome phylogenetic analysis based on extended gene contents. Mol Biol Evol , 2004, 21(7): 1401-1408. [42] S, Bourne PE. The evolutionary history of protein domains viewed by species phylogeny. PLoS O NE , 2009, 4(12): e8378. [43] S, Doolittle RF, Bourne PE. Phylogeny determined by protein domain content. Proc Natl Acad Sci USA , 2005, 102(2): 373-378. [44] X, Huang W, Xu DP, Zhang HM. GeneContent: software for whole-genome phylogenetic analysis. Bioinfo r matics , 2005, 21(8): 1713-1714. [45] AF, Henz SR, Holland BR, Göker M. Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC bioi n formatics , 2006, 7(1): 350. [46] RQ, Huang MS, Wang JW, Huang YS, Yang J, Feng JH, Wang XZ. PTreeRec: Phylogenetic Tree Reconstruction based on genome BLAST distance. Comput Biol Chem , 2006, 30(4): 300-302. [47] MG, Hutchison CA. Gene content phylogeny of herpesviruses. Proc Natl Acad Sci USA , 2000, 97(10): 5334-5339. [48] M, Gonnet M, Hania WB, Forterre P, Erauso G. Insights into dynamics of mobile genetic elements in hyperthermophilic environments from five new Thermoco c cus plasmids. PLoS O NE , 2013, 8(1): e49044. [49] YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol , 2001, 1(1): 8. [50] LS, Warnow T, Moret BM, Jansen RK, Raubeson LA. Distance-based genome rearrangement phylogeny. J Mol Evol , 2006, 63(4): 473-483. [51] JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome phylogenies. Trends Genet , 2002, 18(3): 158-162. [52] BME, Tang JJ, Wang LS, Warnow T. Steps toward accurate reconstructions of phylogenies from gene-order data. J Comput Syst Sci , 2002, 65(3): 508-525. [53] BME, Wyman S, Bader DA, Warnow T, Yan M. A new implementation and detailed study of breakpoint analysis. Pac Symp Biocomput , 2001: 583-594. [54] HW, Shi J, Arndt W, Tang JJ, Friedman R. Gene order phylogeny of the genus Prochlorococcus . PLoS O NE , 2008, 3(12): e3837. [55] HW, Sun ZY, Arndt W, Shi J, Friedman R, Tang JJ. Gene order phylogeny and the evolution of methanogens. PLoS O NE , 2009, 4(6): e6069. [56] F, Cui LY, de Pamphilis CW, Moret BME, Tang JJ. Gene rearrangement analysis and ancestral order inference from chloroplast genomes with inverted repeat. BMC G e nomics , 2008, 9 (Suppl.1): S25. [57] M, Kunisawa T, Sankoff D. Gene order breakpoint evidence in animal mitochondrial phylogeny. J Mol Evol , 1999, 49(2): 193-203. [58] J, Casari G, Ouzounis C, Valencia A. Conserved clusters of functionally related genes in two bacterial genomes. J Mol Evol , 1997, 44(1): 66-73. [59] I, Shamir R. The median problems for breakpoints are NP-complete. P El C Comp Compl , 1998, 71: 1-16. [60] J, Wang B, Hao BL. Whole proteome prokaryote phylogeny without sequence alignment: a K -string composition approach. J Mol Evol , 2004, 58(1): 1-11. [61] Z, Hao BL. CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res , 2009, 37(Suppl.2): W174-W178. [62] HH, Sayood K. A new sequence distance measure for phylogenetic tree constrtion. Bioinformatics , 2003, 19(16): 2122-2130. [63] GW, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics , 2002, 18(1): 100-108. [64] GE, Jun SR, Wu GA, Kim SH. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci USA , 2009, 106(8): 2677-2682. [65] J, Luo H, Hao BL. CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res , 2004, 32(Web Server issue): W45-W47. [66] H, Xu Z, Gao L, Hao BL. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol Biol , 2009, 9(1): 195. [67] 徐昭, 张梦晖, 李旻, 张晨虹, 赵立平. CVTree在454高通量测序分析菌群结构中的应用. 中国微生态学杂志, 2010, 22(4): 312-316. [68] GE, Kim SH. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proc Natl Acad Sci USA , 2011, 108(20): 8329- 8334. [69] SR, Sims GE, Wu GA, Kim SH. Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution. Proc Natl Acad Sci USA , 2010, 107(1): 133-138. [70] CV, Schulten K. Phylogenetic analysis of metabolic pathways. J Mol Evol , 2001, 52(6): 471-489. [71] M, Singh AK. Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinfo r matics , 2003, 19 (Suppl.1): 138-146. [72] HW, Zeng AP. Phylogenetic comparison of metabolic capacities of organisms at genome level. Mol Phylogenet Evol , 2004, 31(1): 204-213. [73] JC, Satou K, Valiente G. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology. Genome I n form , 2005, 16(2): 45-55. [74] A, Tuller T, Béjà O, Pinter RY. Comparative classification of species and the study of pathway evolution based on the alignment of metabolic pathways. BMC Bi o inform , 2010, 11(Suppl.1): S38. [75] P, Che DS. Constructing phylogenetic trees using interacting pathways. Bioinformation , 2013, 9(7): 363-367. [76] 高琳, 宋佳. 一种基于代谢路径构建系统发生树的有效方法. 电子学报, 2009, 37(8): 1633-1638. [77] CJ, McInerney JO. Clann: investigating phylogenetic information through supertree analyses. Bioinfo r matics , 2005, 21(3): 390-392. [78] R, Gascuel O. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol , 2002, 9(5): 687-705. [79] A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature , 2003, 425(6960): 798-804. [80] 黄原, 汪晓阳. 直系同源基因的识别方法与数据库. 生命科学研究, 2013, 17(3): 274-277. [81] L, Stoeckert CJ Jr., Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res , 2003, 13(9): 2178-2189. [82] M, Storm CEV, Sonnhammer ELL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol , 2001, 314(5): 1041-1052. [83] RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinform , 2003, 4(1): 41. [84] DM, Wolf YI, Mushegian AR, Koonin EV. Computational methods for Gene Orthology inference. Brief Bioinform , 2011, 12(5): 379-391. [85] B, Thompson JD, Poch O, Lecompte O. Ortho-Inspector: comprehensive orthology analysis and visual exploration. BMC Bioinform , 2011, 12: 11. [86] T, Huynen MA, de Vlieg J, Groenen PMA. Benchmarking ortholog identification methods using functional genomics data. Genome Biol , 2006, 7(4): R31. [87] AM, Schneider A, Gonnet GH, Dessimoz C. OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res , 2011, 39(Suppl.1): D289- D294. [88] J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, Powell S, von Mering C, Doerks T, Jensen LJ, Bork P. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res , 2010, 38(Database issue): D190-D195. [89] YI, Rogozin IB, Grishin NV, Koonin EV. Genome trees and the tree of life. Trends Genet , 2002, 18(9): 472-479. [90] JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ. Genome-scale phylogenetics: inferring the plant tree of life from 18, 896 gene trees. Syst Biol , 2011, 60(2): 117-125. [91] S. Phylogenetic networks: concepts, algorithms and applications. Syst Biol , 2012, 61(1): 174-175. [92] F, Wang LS, Kim J. The cobweb of life revealed by genome-scale estimates of horizontal gene transfer. PLoS Biol , 2005, 3(10): e316. [93] N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H. Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol , 2007, 56(3): 389-399. [94] 葛颂. 基因树冲突与系统发育基因组学研究. 植物分类学报, 2008, 46(6): 795-807. [95] O, Brinkmann H, Delsuc F, Philippe H. Phylo-genomics: the beginning of incongruence? Trends Genet , 2006, 22(4): 225-231. [96] SA, Beaulieu JM, Stamatakis A, Donoghue MJ. Understanding angiosperm diversification using small and large phylogenetic trees. Am J Bot , 2011, 98(3): 404-414. [97] B, Street NR, Wetterbom A, Zuccolo A, Lin YC, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R, Sahlin K, Sherwood E, Elfstrand M, Gramzow L, Holmberg K, Hallman J, Keech O, Klasson L, Koriabine M, Kucukoglu M, Kaller M, Luthman J, Lysholm F, Niittyla T, Olson A, Rilakovic N, Ritland C, Rossello JA, Sena J, Svensson T, Talavera-Lopez C, Theissen G, Tuominen H, Vanneste K, Wu ZQ, Zhang B, Zerbe P, Arvestad L, Bhalerao R, Bohlmann J, Bousquet J, Garcia Gil R, Hvidsten TR, de Jong P, MacKay J, Morgante M, Ritland K, Sundberg B, Thompson SL, Van de Peer Y, Andersson B, Nilsson O, Ingvarsson PK, Lundeberg J, Jansson S. The Norway spruce genome sequence and conifer genome evolution. Nature , 2013, 497(7451): 579-584. [98] PS, Soltis DE. A conifer genome spruces up plant phylogenomics. Genome Biol , 2013, 14(6): 122. [99] F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet , 2005, 6(5): 361-375. [100] CX, Ragan MA. Next-generation phylogenomics. Biol Direct , 2013, 8: 3. [101] Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD. The Pfam protein families database. Nucleic Acids Res , 2012, 40(Database issue): D290-D301. |
[1] | 邹文超, 沈林林, 沈建国, 蔡伟, 詹家绥, 高芳銮. 马铃薯Y病毒多基因系统发育分析及其在株系鉴定中的应用[J]. 遗传, 2017, 39(10): 918-929. |
[2] | 刘芳, 宋小珍, 谢华, 陈晓丽. 体细胞变异对神经系统常见肿瘤和发育异常类疾病的致病性[J]. 遗传, 2016, 38(3): 196-205. |
[3] | 杨献伟,杨瑞馥,崔玉军. 细菌基因组同源重组:量化与鉴定[J]. 遗传, 2016, 38(2): 137-143. |
[4] | 吉克伍合,武泽峰,范三红,奚绪光. 真核生物FANCJ-like蛋白的结构与进化[J]. 遗传, 2015, 37(2): 204-213. |
[5] | 李雪娟, 黄原, 雷富民. 山鹧鸪属鸟类线粒体基因组的比较及系统发育研究[J]. 遗传, 2014, 36(9): 912-920. |
[6] | 海萨·艾也力汗,郭焱,孟玮,杨天燕,马燕武. 新疆裂腹鱼类的系统发生关系及物种分化时间[J]. 遗传, 2014, 36(10): 1013-1020. |
[7] | 金逍逍 孙悦娜 王日昕 汤达 赵盛龙 徐田军. 虾虎鱼类线粒体全基因组序列结构特征分析及系统发育关系探讨[J]. 遗传, 2013, 35(12): 1391-1402. |
[8] | 曹联飞,牛德芳,和绍禹,匡海鸥,胡福良. 基于线粒体和核基因序列的蜜蜂属系统发育分析[J]. 遗传, 2012, 34(8): 1057-1063. |
[9] | 杨泽民,陈蔚文. 幽门螺杆菌vacA和cagA基因全长分子系统发育分析[J]. 遗传, 2012, 34(7): 863-871. |
[10] | 刘庆辉,郭振国,任嘉红. 原核生物eno基因在系统进化中应用及水平转移分析[J]. 遗传, 2012, 34(7): 907-918. |
[11] | 李娟,张克勤. 微生物的遗传多样性[J]. 遗传, 2012, 34(11): 1399-1408. |
[12] | 王金凤,张亚平,于黎. 食肉目猫科物种的系统发育学研究概述[J]. 遗传, 2012, 34(11): 1365-1378. |
[13] | 钱琰琰,王慧君,马端. 特异AT序列结合蛋白2(SATB2)的研究进展[J]. 遗传, 2011, 33(9): 947-952. |
[14] | 曹家树. 基于生物适应进化原理论证生物进化的动力[J]. 遗传, 2010, 32(8): 791-798. |
[15] | 陈念,赖小平. 眼镜王蛇线粒体基因组全序列分析[J]. 遗传, 2010, 32(7): 719-725. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
www.chinagene.cn
备案号:京ICP备09063187号-4
总访问:,今日访问:,当前在线: