BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model

被引:0
|
作者
Luo, Hanyu [1 ,2 ]
Tang, Li [1 ]
Zeng, Min [1 ]
Yin, Rui [3 ]
Ding, Pingjian [4 ]
Luo, Lingyun [2 ]
Li, Min [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Hunan, Peoples R China
[2] Univ South China, Sch Comp Sci, 28 West Changsheng Rd, Hengyang 421001, Hunan, Peoples R China
[3] Univ Florida, Dept Hlth Outcome & Biomed Informat, Gainesville, FL 32611 USA
[4] Case Western Reserve Univ, Ctr Artificial Intelligence Drug Discovery, Sch Med, Cleveland, OH 44106 USA
基金
中国国家自然科学基金;
关键词
CHIP-SEQ; DATABASE; OCT4;
D O I
10.1093/bioinformatics/btae461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Transcription factors are pivotal in the regulation of gene expression, and accurate identification of transcription factor binding sites (TFBSs) at high resolution is crucial for understanding the mechanisms underlying gene regulation. The task of identifying TFBSs from DNA sequences is a significant challenge in the field of computational biology today. To address this challenge, a variety of computational approaches have been developed. However, these methods face limitations in their ability to achieve high-resolution identification and often lack interpretability.Results We propose BertSNR, an interpretable deep learning framework for identifying TFBSs at single-nucleotide resolution. BertSNR integrates sequence-level and token-level information by multi-task learning based on pre-trained DNA language models. Benchmarking comparisons show that our BertSNR outperforms the existing state-of-the-art methods in TFBS predictions. Importantly, we enhanced the interpretability of the model through attentional weight visualization and motif analysis, and discovered the subtle relationship between attention weight and motif. Moreover, BertSNR effectively identifies TFBSs in promoter regions, facilitating the study of intricate gene regulation.Availability and implementation The BertSNR source code can be found at https://github.com/lhy0322/BertSNR. Graphical Abstract
引用
收藏
页数:10
相关论文
共 32 条
  • [21] DENT-seq for genome-wide strand-specific identification of DNA single-strand break sites with single-nucleotide resolution
    Elacqua, Joshua J.
    Ranu, Navpreet
    DiIorio, Sarah E.
    Blainey, Paul C.
    GENOME RESEARCH, 2021, 31 (01) : 75 - 87
  • [22] Using Deep Learning to Predict Transcription Factor Binding Sites Based on Multiple-omics Data
    Xu, Youhong
    Yuan, Changan
    Wu, Hongjie
    Zhao, Xingming
    INTELLIGENT COMPUTING THEORIES AND APPLICATION (ICIC 2022), PT I, 2022, 13393 : 799 - 810
  • [23] Machine Learning Aided Interpretable Approach for Single Nucleotide-Based DNA Sequencing using a Model Nanopore
    Jena, Milan Kumar
    Roy, Diptendu
    Pathak, Biswarup
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2022, 13 (50): : 11818 - 11830
  • [24] Identification of upstream stimulatory factor binding sites in the human IGFBP3 promoter and potential implication of adjacent single-nucleotide Polymorphisms and responsiveness to insulin
    Paquette, Jean
    Bessette, Benoit
    Ledru, Elisabeth
    Deal, Cheri
    ENDOCRINOLOGY, 2007, 148 (12) : 6007 - 6018
  • [25] How Deepbics Quantifies Intensities of Transcription Factor-DNA Binding and Facilitates Prediction of Single Nucleotide Variant Pathogenicity With a Deep Learning Model Trained On ChIP-Seq Data Sets
    Quan, Lijun
    Chu, Xiaomin
    Sun, Xiaoyu
    Wu, Tingfang
    Lyu, Qiang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1594 - 1599
  • [26] DeepCTF: transcription factor binding specificity prediction using DNA sequence plus shape in an attention-based deep learning model
    Tariq, Sana
    Amin, Asjad
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5239 - 5251
  • [27] Identification of Transcription Factor Binding Sites Based on the Chi-Square (χ2) distance of a Probabilistic Vector Model
    Huang, Lun
    Al Bataineh, Mohammad
    Atkin, G. E.
    Mohammed, Ismaeel
    Zhang, Wei
    Parra, Maria
    del Mar Perez, Maria
    2009 INTERNATIONAL CONFERENCE ON FUTURE BIOMEDICAL INFORMATION ENGINEERING (FBIE 2009), 2009, : 73 - +
  • [28] Predicting bacterial transcription factor binding sites through machine learning and structural characterization based on DNA duplex stability
    Borges Farias, Andre
    Martinez, Gustavo Sganzerla
    Galan-Vasquez, Edgardo
    Nicolas, Marisa Fabiana
    Perez-Rueda, Ernesto
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [29] BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning
    Wang, Kai
    Zeng, Xuan
    Zhou, Jingwen
    Liu, Fei
    Luan, Xiaoli
    Wang, Xinglong
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
  • [30] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
    Liu, Yufan
    Tian, Boxue
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)