遗传 ›› 2008, Vol. 30 ›› Issue (9): 1169-1174.doi: 10.3724/SP.J.1005.2008.01169

• 研究报告 • 上一篇    下一篇

人类蛋白编码基因局部GC水平相关性分析

陈祥贵; 胡军; 杨潇   

  1. 西华大学生物工程学院, 成都610039

  • 收稿日期:2007-12-26 修回日期:2008-01-18 出版日期:2008-09-10 发布日期:2008-09-10
  • 通讯作者: 陈祥贵

Analysis of correlation of local GC level in human protein coding genes

CHEN Xiang-Gui; HU Jun; YANG Xiao   

  1. School of Bioengineering, Xihua University, Chengdu 610039, China
  • Received:2007-12-26 Revised:2008-01-18 Online:2008-09-10 Published:2008-09-10
  • Contact: CHEN Xiang-Gui

摘要:

GC含量是基因组DNA序列碱基组成的重要特征, 蕴涵基因结构、功能和进化信息。文中通过从公共数据库提取7 992个非冗余的人类蛋白质编码基因DNA序列, 分析了基因序列不同区域的局部GC含量和相关性。结果表明: 基因局部GC含量呈现不均一性, 5′非翻译区GC水平最高, 为62.56%; 而3′非翻译区GC水平最低, 为43.97%。3′侧翼序列的GC含量能较好地代表基因所在区域DNA长片段的GC水平。虽然开放阅读框的GC含量比内含子、3′非翻译区和3′侧翼序列的GC含量高, 但4个区域的GC含量之间均存在较高的相关性。密码子第三位置的平均GC含量(GC3)为58.09%, 显著高于密码子第一位置和第二位置的GC含量, 且与开放阅读框的GC水平高度相关, 相关系数高达0.91。GC3与内含子、3′非翻译区、3′侧翼序列的GC水平相关性也较高, GC3对3′侧翼序列的GC含量的直线回归斜率为1.25。因此, GC3可作为基因所在区域GC水平变化的敏感性指标。而密码子第一位置和第二位置以及5′侧翼序列和5′非翻译区GC水平与基因其他区域的GC水平的相关性较弱。该研究结果提示: 基因蛋白编码区密码子第三位置、内含子、3′非翻译区和3′侧翼序列的碱基可能经历了相近的进化过程, 而蛋白编码区密码子第一位置和第二位置、5′侧翼序列和5′非翻译区由于功能的需要而经历了不同的突变和选择。

关键词: 人类蛋白编码基因, 相关, 局部GC含量

Abstract:

GC level is an important feature of genomic composition, which significantly improve our understanding of structure, function and evolution of genes. In this paper, the nonredundant DNA sequence of 7 992 human protein coding genes were retrieved from public database and the local GC level of different sequence regions and correlation between GC levels were analyzed.. The results showed that the GC levels of different sequence regions were strikingly nonuniform. 5′ untranslated regions were of richest GC, with average GC content being 62.5%. 3′-untranslated regions were of poorest GC, with average GC content being 43.97%. GC contents of 3′ flanking sequences profoundly matched the GC levels of DNA large fragments where the genes were located. Although the GC contents of open reading frames (ORFs) were higher than that of intron, 3′ non-translated region and 3′ flanking sequences, high correlation existed among the GC contents of the four regions. Average GC content of the third codon position (GC3) was 58.9%, higher than that of the fist and second position, and showed high correlation to GC contents of ORFs, with correlation coefficients being 0.91, besides of its significant association with GC contents of intron, 3′-untranslated region and 3′ flanking sequences. Moreover, the linear regression of GC3 against GC contents of 3′ flanking sequences yielded a slope of 1.25. Thus, GC3 was a sensitive indicator for GC change of local genome. As for 5′ flanking sequences, 5′ untranslated regions, fist and second codon position, however, their GC level exhibited weaker correlation with that of other regions. These results suggest that the third codon positions, introns, 3′-untranslated regions and 3′ flanking sequences may evolve similarly while first and second codon positions, 5′ flanking sequences and 5′ untranslated region were expected to bear more selective stress for holding their functions.