[an error occurred while processing this directive]

Hereditas(Beijing) ›› 2024, Vol. 46 ›› Issue (8): 661-669.doi: 10.16288/j.yczz.24-102

• Technique and Method • Previous Articles    

EC number prediction of protein sequences based on combination of hierarchical and global features

Fan Yang1,2,3,4(), Qiaoling Han1,2,3,4, Wendi Zhao1,2,3,4, Yue Zhao1,2,3,4()   

  1. 1. School of technology, Beijing Forestry University, Beijing 100083, China
    2. Key Lab of State Forestry Administration for Forestry Equipment and Automation, Beijing 100083, China
    3. Beijing Laboratory of Urban and Rural Ecological Environment, Beijing 100083, China
    4. Research Center for Intelligent Forestry, Beijing Forestry University, Beijing 100083, China
  • Received:2024-04-12 Revised:2024-06-18 Online:2024-08-20 Published:2024-06-19
  • Contact: Yue Zhao E-mail:yangfan_muyi@163.com;zhaoyue0609@126.com
  • Supported by:
    National Natural Science Foundation of China(32071838);National Natural Science Youth Foundation of China(32101590)

Abstract:

The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.

Key words: enzyme function prediction, protein sequence, deep learning, hierarchical multi-label classification, global feature