遗传 ›› 2014, Vol. 36 ›› Issue (3): 237-247.doi: 10.3724/SP.J.1005.2014.0237

• 综述 • 上一篇    下一篇

ENCODE计划和功能基因组研究

丁楠1, 2, 渠鸿竹1, 方向东1   

  1. 1. 中国科学院北京基因组研究所, 中国科学院基因组科学及信息重点实验室, 北京 100101;
    2. 中国科学院大学, 北京100049
  • 收稿日期:2013-09-17 修回日期:2013-12-23 出版日期:2014-03-20 发布日期:2014-02-27
  • 通讯作者: 方向东, 博士, 研究员, 研究方向:干细胞与重要疾病的组学与转化医学。E-mail: fangxd@big.ac.cn E-mail:fangxd@big.ac.cn
  • 作者简介:丁楠, 在读博士研究生, 专业方向:基因组学数据挖掘。Tel: 010-84097538; E-mail: dingnan@big.ac.cn
  • 基金资助:

    中国科学院干细胞与再生医学研究战略性科技先导专项子课题(编号:XDA01040405)资助

The ENCODE project and functional genomics studies

Nan Ding1,2, Hongzhu Qu1, Xiangdong Fang1   

  1. 1. CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; 
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2013-09-17 Revised:2013-12-23 Online:2014-03-20 Published:2014-02-27

摘要:

人类基因组计划完成以来, 科学家们一直在努力阐释基因组信息所代表的生物学意义。自2003年开始, 美国国家人类基因组研究所(National Human Genome Research Institute, NHGRI)投资近3亿美元启动“DNA元件百科全书(Encyclopedia of DNA Elements, ENCODE)”计划, 集结了来自美国、中国、英国、日本、西班牙和新加坡等国家的32个实验室的440余名科学家, 共同鉴定并分析人类基因组中所有的功能调控元件。高通量测序技术等实验手段的发展和生物信息学技术的不断完善使得ENCODE计划取得了丰硕的成果:确定了甲基化和组蛋白修饰等表观修饰区域及其对染色质结构的作用, 进而确定染色质结构的改变影响基因表达; 确定了转录因子及其结合位点的信息, 并构建了转录因子调控网络; 进一步修订更新了假基因和非编码RNA数据库; 并确定了调控序列的单核苷酸多态性(Single nucleotide polymorphism, SNP)并与疾病相关联。这些发现一方面有助于系统解析基因和基因组信息、调控元件的调控作用以及非编码区转录调控等分子机制; 同时也将为转化医学等生命科学研究领域提供丰富的数据来源。文章综述了高通量测序技术等实验手段的发展和生物信息学技术的不断完善对ENCODE计划的贡献、表观遗传学研究与ENCODE计划的关联性、ENCODE计划的主要科学成果等, 同时展望了ENCODE计划对基础医学、临床医学和转化医学等生命科学研究领域的巨大推动作用。

关键词: ENCODE, 表观遗传学, 新一代测序技术, 转录调控

Abstract:

Upon the completion of the Human Genome Project, scientists have been trying to interpret the underlying genomic code for human biology. Since 2003, National Human Genome Research Institute (NHGRI) has invested nearly $0.3 billion and gathered over 440 scientists from more than 32 institutions in the United States, China, United Kingdom, Japan, Spain and Singapore to initiate the Encyclopedia of DNA Elements (ENCODE) project, aiming to identify and analyze all regulatory elements in the human genome. Taking advantage of the development of next-generation sequencing technologies and continuous improvement of experimental methods, ENCODE had made remarkable achievements: identified methylation and histone modification of DNA sequences and their regulatory effects on gene expression through altering chromatin structures, categorized binding sites of various transcription factors and constructed their regulatory networks, further revised and updated database for pseudogenes and non-coding RNA, and identified SNPs in regulatory sequences associated with diseases. These findings help to comprehensively understand information embedded in gene and genome sequences, the function of regulatory elements as well as the molecular mechanism underlying the transcriptional regulation by noncoding regions, and provide extensive data resource for life sciences, particularly for translational medicine. We re-viewed the contributions of high-throughput sequencing platform development and bioinformatical technology improve-ment to the ENCODE project, the association between epigenetics studies and the ENCODE project, and the major achievement of the ENCODE project. We also provided our prospective on the role of the ENCODE project in promoting the development of basic and clinical medicine.

Key words: ENCODE, epigenetics, next-generation sequencing, transcriptional regulation