遗传 ›› 2005, Vol. 27 ›› Issue (4): 629-635.

• 研究报告 • 上一篇    下一篇

寻找水稻DNA编码与非编码区边界方法的比较研究

孙奕钢;高 雷;张忠华;薛庆中   

  1. 浙江大学农业与生物技术学院杭州 310027
  • 收稿日期:2004-07-08 修回日期:2004-11-19 出版日期:2005-08-10 发布日期:2005-08-10
  • 通讯作者: 薛庆中

Comparison Study on the Methods for Finding Borders Between Coding and Non-coding DNA Regions in Rice

SUN Yi-Gang , GAO Lei , ZHANG Zhong-Hua , XUE Qing-Zhong   

  1. Department of Mathematics, Zhejiang University, Hangzhou 310027,China
  • Received:2004-07-08 Revised:2004-11-19 Online:2005-08-10 Published:2005-08-10
  • Contact: XUE Qing-Zhong

摘要: 在分析DNA序列复杂度、预测基因编码区和非编码的DNA边界识别等问题中,以熵为基础构造的离散量度量提供了一种强有力的工具。为改进寻找水稻基因编码与非编码区边界的效率,本文提出了两个新的离散量度量(α-KL离散量与α-Jensen-Shannon 离散量),根据密码子的GC含量对氨基酸对应密码子构建了粗粒化向量。 比较了融合Jensen-Shannon 离散量、Jensen-Renyi 离散量、α-KL离散量和α-Jensen-Shannon 离散量等不同向量所获得的精度,结果表明,在对水稻基因编码区‘终止子’的识别效率上,构建的密码子粗粒化向量融合新引进的度量方法比Bernaola等人的方法(2000)提高了4~5倍。

关键词: Jensen-Renyi离散量, Jensen-Shannon离散量, α-KL 离散量, α-Jensen-Shannon 离散量, GC含量

Abstract: Entropy-based divergence measures have provided an impelling tool in evaluating sequence complexity, predicting CpG island, and detecting borders between coding and non-coding DNA regions etc. In this paper, two new divergence measures: the α-KL divergence and the α-Jensen-Shannon divergence were defined and a coarse-graining vector of amino acids- corresponding codons was proposed according to codons GC-content, in order to improve the computational approach to finding borders between coding and non-coding in rice. By comparing the accuracies gained by different vectors (the Jensen-Shannon divergence, the Jensen-Renyi divergence, the α-KL divergence and the α-Jensen -Shannon divergence) for detecting borders between coding and non-coding DNA regions, it was showed that recognition efficiency of the new information measures with the coarse-graining vector was increased by 4~5 times for the ‘stop codon’ of coding regions in rice, when compared with Bernaola’s method.

中图分类号: