融合通道与空间注意力机制的转录因子结合位点预测方法

doi:10.16288/j.yczz.25-184

遗传 ›› 2026, Vol. 48 ›› Issue (5): 522-534.doi: 10.16288/j.yczz.25-184

融合通道与空间注意力机制的转录因子结合位点预测方法

丰继华¹^,²(), 陈忠兴¹^,², 康琦林¹^,², 李龙飞¹^,², 杨佳慧¹^,², 张雨亭¹^,²

¹ 云南民族大学电气信息工程学院信息工程系，昆明 650504
² 云南省无人自主系统重点实验室，昆明 650504

收稿日期:2025-10-16 修回日期:2026-01-04 出版日期:2026-05-20 发布日期:2026-01-13
通讯作者: 丰继华，博士，副教授，研究方向：生物信息学与机器学习，E-mail: fengjihua @ymu.edu.cn
基金资助:
国家自然科学基金项目(31160234)

Prediction method for transcription factor binding sites integrating channel and spatial attention mechanisms

Jihua Feng¹^,²(), Zhongxing Chen¹^,², Qilin Kang¹^,², Longfei Li¹^,², Jiahui Yang¹^,², Yuting Zhang¹^,²

¹ School of Electrical and Information Engineering, Yunnan Minzu University, Kunming 650504, China
² Yunnan Key Laboratory of Unmanned Autonomous System, Kunming 650504, China

Received:2025-10-16 Revised:2026-01-04 Published:2026-05-20 Online:2026-01-13
Supported by:
National Natural Science Foundation of China(31160234)

摘要/Abstract

摘要：

精准识别单核苷酸分辨率下的转录因子结合位点(transcription factor binding sites, TFBSs)是解析基因表达调控网络的核心科学问题。为改进现有计算模型在跨细胞类型预测中的性能，本研究提出一种融合通道与空间注意力机制的深度学习模型。通过系统整合10个核心转录调控因子(包括CTCF、EGR1、FOXA1等)在13种典型人类细胞系(涵盖A549、GM12878、H1-hESC等)的51组染色质免疫沉淀测序(chromatin immunoprecipitation sequencing, ChIP-seq)数据和13组脱氧核糖核酸酶I高敏感位点测序(deoxyribonuclease I hypersensitive site sequencing, DNase-seq)数据对模型进行训练与测试，结果表明在23个测试的TF-细胞类型中表现出优异性能，平均受试者工作特征曲线下面积(area under receiver operating characteristic curve, AUROC)达到0.986，其中91%样本的AUROC超过0.970；平均精确率-召回率曲线下面积(area under precision recall curve, AUPRC)为0.169，较随机预测基线(0.000156)提升超1,000倍。相较于FactorNet、Leopard及DeepGRN等当前领域内具有代表性的模型，本模型在9个共有的TF-细胞类型数据集上，其AUROC均值展现出优势。可视化分析表明，模型能精准识别TF在不同细胞类型中的特异性结合位点。上述结果表明，本模型为跨细胞类型的TFBSs精准预测提供了高效计算工具，有望为基因表达调控机制的深入解析及相关疾病分子机理研究提供重要支撑。

关键词: 转录因子结合位点, 注意力机制, 深度学习, 单核苷酸分辨率, 跨细胞预测

Abstract:

Accurate identification of transcription factor binding sites (TFBSs) at single-nucleotide resolution remains a central challenge in deciphering gene expression regulatory networks. To improve the performance of existing computational models for predicting TFBSs across different cell types, we presented a deep learning model integrating channel and spatial attention mechanisms. In this study, we trained and tested the model using a comprehensive dataset that included ChIP-seq data from 51 groups, involving 10 core transcription factors (e.g., CTCF, EGR1, FOXA1) across 13 human cell lines (e.g., A549, GM12878, H1-hESC), and DNase-seq data from 13 datasets. The results demonstrated that this model exhibited superior performance across 23 TF-cell type combinations, achieving a mean area under the receiver operating characteristic curve (AUROC) of 0.986, with 91% of samples yielding an AUROC above 0.970. Additionally, the mean area under the precision-recall curve (AUPRC) reached 0.169, over 1,000-fold higher than the random baseline 0.000156. When compared to state-of-the-art models in the field, such as FactorNet, Leopard, and DeepGRN, our model outperformed them in terms of AUROC on nine shared TF-cell type datasets. Visualization analyses further confirmed that our model enabled accurate identification of cell-type-specific TFBSs. This study provides an efficient computational framework for precise cross-cell-type TFBS prediction, thereby facilitating in-depth investigations into gene expression regulatory mechanisms and the molecular pathogenesis of related diseases.

Key words: transcription factor binding sites, attention mechanism, deep learning, single-nucleotide resolution, cross- cell prediction

丰继华, 陈忠兴, 康琦林, 李龙飞, 杨佳慧, 张雨亭. 融合通道与空间注意力机制的转录因子结合位点预测方法[J]. 遗传, 2026, 48(5): 522-534.

Jihua Feng, Zhongxing Chen, Qilin Kang, Longfei Li, Jiahui Yang, Yuting Zhang. Prediction method for transcription factor binding sites integrating channel and spatial attention mechanisms[J]. Hereditas(Beijing), 2026, 48(5): 522-534.

图/表 13

表1

表2

表3

图1

图2

图3

图4

图5

图6

图7

图8

表4

图9

参考文献 33

[1]	Zhao ZH. Research on prediction of transcription factor binding sites based on deep kernel network[Dissertation]. Beijing Jiaotong University, 2023.
	赵子涵. 基于深度核网络的转录因子结合位点预测的研究[学位论文]. 北京交通大学, 2023.
[2]	Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei GH, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, Taipale J. DNA-binding specificities of human transcription factors. Cell, 2013, 152(1): 327-339.
[3]	Lambert SA, Jolma A, Campitelli LF, Das PK, Yin YM, Albu M, Chen XT, Taipale J, Hughes TR, Weirauch MT. The human transcription factors. Cell, 2018, 172(4): 650-665. pmid: 29425488
[4]	Johnson DS, Mortazavi A, Myers RM, Wold B. Genome- wide mapping of in vivo protein-DNA interactions. Science, 2007, 316(5830): 1497-1502. pmid: 17540862
[5]	He QY, Johnston J, Zeitlinger J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol, 2015, 33(4): 395-401. pmid: 25751057
[6]	Zhang ZY, Zhou YP, Meng ZX. The protocol of CUT&Tag for metabolic tissue cells. Hereditas(Beijing), 2022, 44(10): 958-966.
	张子寅, 周燕萍, 孟卓贤. CUT&Tag技术在代谢组织细胞的实验操作. 遗传, 2022, 44(10): 958-966.
[7]	Hesselberth JR, Chen XY, Zhang ZH, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS, Fields S, Stamatoyannopoulos JA. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods, 2009, 6(4): 283-289. pmid: 19305407
[8]	Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA- binding proteins and nucleosome position. Nat Methods, 2013, 10(12): 1213-1218. pmid: 24097267
[9]	Suresh SK. Beginner’s guide to investigating protein: DNA interactions using electrophoretic mobility shift assays (EMSAs). Biochem (Lond), 2024, 46(5): 8-11.
[10]	Hook H, Zhao RW, Bray D, Keenan JL, Siggers T. High- throughput analysis of the cell and DNA site-specific binding of native NF-κB dimers using nuclear extract protein- binding microarrays (nextPBMs). Methods Mol Biol, 2021, 2366: 43-66. pmid: 34236632
[11]	Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet, 2019, 51(1): 12-18. pmid: 30478442
[12]	Ghandi M, Lee D, Mohammad-Noori M, Beer MA. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol, 2014, 10(7): e1003711. pmid: 25033408
[13]	Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res, 2011, 21(3): 447-455. pmid: 21106904
[14]	Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods, 2015, 12(10): 931-934. pmid: 26301843
[15]	Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol, 2015, 33(8): 831-838. pmid: 26213851
[16]	Quang D, Xie XH. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res, 2016, 44(11): e107. pmid: 27084946
[17]	Wang M, Tai C, E W, Wei LP. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res, 2018, 46(11): e69. pmid: 29617928
[18]	Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, Zeitlinger J. Base-resolution models of transcription- factor binding reveal soft motif syntax. Nat Genet, 2021, 53(3): 354-366. pmid: 33603233
[19]	Li HY, Quang D, Guan YF. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res, 2019, 29(2): 281-292. pmid: 30567711
[20]	Quang D, Xie XH. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods, 2019, 166: 40-47. pmid: 30922998
[21]	Li HY, Guan YF. Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution. Genome Res, 2021, 31(4): 721-731. pmid: 33741685
[22]	Fu LY, Zhang LH, Dollinger E, Peng QK, Nie Q, Xie XH. Predicting transcription factor binding in single cells through deep learning. Sci Adv, 2020, 6(51): eaba9031. pmid: 33355120
[23]	Yang TQ, Henao R. TAMC: a deep-learning approach to predict motif-centric transcriptional factor binding activity based on ATAC-seq profile. PLoS Comput Biol, 2022, 18(9): e1009921. pmid: 36094959
[24]	Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv, 2015, 1508.04025.
[25]	Wang YQ, Huang ML, Zhao L, Zhu XY. Attention-based LSTM for aspect-level sentiment classification. Proceedings of the 2016 conference on empirical methods in natural language processing, 2016: 606-615.
[26]	Chen C, Hou J, Shi XW, Yang H, Birchler JA, Cheng JL. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics, 2021, 22(1): 38. pmid: 33522898
[27]	Yao Z, Zhang WJ, Song P, Hu YX, Liu JX. DeepFormer: a hybrid network based on convolutional neural network and flow-attention mechanism for identifying the function of DNA sequences. Brief Bioinform, 2023, 24(2): bbad095. pmid: 36917472
[28]	Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing, 2015: 234-241.
[29]	Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol, 2019, 20(1): 9. pmid: 30630522
[30]	Hu DD, Zhang ZT, Niu GC. Lane line detection incurporating CBAM mechanism and deformable convolutional network. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(7): 2150-2160.
	胡丹丹, 张忠婷, 牛国臣. 融合CBAM注意力机制与可变形卷积的车道线检测. 北京航空航天大学学报, 2024, 50(7): 2150-2160.
[31]	Johnson OV, Chew XY, Khaw KW, Lee MH. ps-CALR: periodic-shift cosine annealing learning rate for deep neural networks. IEEE Access, 2023, 11: 139171-139186.
[32]	Li FF, Wang Y, Gu JH, Zhang YM, Liu FS, Ni ZH. E2F family play important roles in tumorigenesis. Hereditas(Beijing), 2023, 45(7): 580-592.
	李飞飞, 王韵, 顾冀海, 张玉明, 柳峰松, 倪志华. E2F家族转录因子在肿瘤发生中的作用. 遗传, 2023, 45(7): 580-592.
[33]	Fan XD, Zhou J, Jiang XL, Xin MZ, Hou LM. CSAP- UNet: convolution and self-attention paralleling network for medical image segmentation with edge enhancement. Comput Biol Med, 2024, 172: 108265. pmid: 38461698

编辑推荐

Metrics

www.chinagene.cn
备案号：京ICP备09063187号-4
总访问:,今日访问:,当前在线:

名称	参数
电脑型号	华硕ASUS TUF Gaming A15 FA507UV FA507UV
操作系统	Ubuntu 24.04.1 LTS
处理器	AMD Ryzen7 8845H w/Radeon 780M Graphics八核
主板	华硕FA507UV
内存	三星16 GB DDR55600 MHz (8 GB+8 GB)
磁盘	SAMSUNG MZVL8512HELU-00BTW (512 GB)、AI Mass Storage USB Device (32 GB)
显卡	AMD Radeon 780 M Graphics (420 MB/华硕)
显示器	友达AUOD2A2 (15.3英寸)
声卡	AMD High Definition Audio Device、NVIDIA High Definition Audio、Realtek High Definition Audio
网卡	Realtek PCle GbE Family Controller、Realtek 8852BE Wireless LAN WiFi 6 PCI-E NIC

TF	训练集	测试集
CTCF	A549、H1-hESC、HepG2、IMR-90、K562、MCF-7	GM12878、HCT116、iPSC、PC-3
EGR1	GM12878、H1-hESC	K562、liver
FOXA1	HepG2	liver、MCF-7
FOXA2	HepG2	liver
GABPA	GM12878、H1-hESC、HeLa-S3、HepG2	K562、liver
HNF4A	HepG2	liver
JUND	HCT116、HepG2、K562、MCF-7	GM12878、H1-hESC、liver
NANOG	H1-hESC	iPSC
REST	H1-hESC、HeLa-S3、HepG2、PANC-1	A549、GM12878、K562、liver
TAF1	GM12878、 H1-hESC、 HeLa-S3、 K562	A549、HepG2、liver

研究方法	通道注意力	空间注意力	维度
本研究	平均池化	未池化(保留完整位置信息)	一维
CBAM	平均池化+最大池化	平均池化+最大池化	二维

研究方法	Chr1	Chr8	Chr21
本文方法	13.97 min	7.84 min	2.78 min
“多对一”模型	137.06 h	89.49 h	29.02 h

融合通道与空间注意力机制的转录因子结合位点预测方法

Prediction method for transcription factor binding sites integrating channel and spatial attention mechanisms

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 33

相关文章 8

编辑推荐

Metrics

[1]	周文譞, 赵真坚, 陈栋, 崔晟頔, 王俊戈, 陈子旸, 禹世欣, 陈佳苗, 周垚茜, 黄润杰, 唐国庆. 基于染色体编码的多头自注意力模型进行大白猪生长性状表型的基因组预测[J]. 遗传, 2026, 48(3): 331-340.
[2]	高炳熙, 吴华煊, 杜志强. 应用图像转换与深度学习提升单细胞分类精度[J]. 遗传, 2025, 47(3): 382-392.
[3]	鲍艳春, 石彩霞, 张传强, 谷明娟, 朱琳, 刘在霞, 周乐, 马凤英, 娜日苏, 张文广. 深度学习在基因组学中的研究进展[J]. 遗传, 2024, 46(9): 701-715.
[4]	杨帆, 韩巧玲, 赵文迪, 赵玥. 基于层级和全局特征结合的蛋白质序列EC编号预测[J]. 遗传, 2024, 46(8): 661-669.
[5]	郑慧怡, 吴华煊, 杜志强. 肠道宏基因组图像增强和深度学习改善代谢性疾病分类预测精度[J]. 遗传, 2024, 46(10): 886-896.
[6]	胡伟澎, 李佑平, 张秀清. 基于迁移学习的MHC-I型抗原表位呈递预测[J]. 遗传, 2019, 41(11): 1041-1049.
[7]	秦丹徐存拴. 非编码DNA序列的功能及其鉴定[J]. 遗传, 2013, 35(11): 1253-1264.
[8]	侯琳，钱敏平，朱云平，邓明华，. 转录因子结合位点生物信息学研究进展[J]. 遗传, 2009, 31(4): 365-373.