Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach

被引:5
|
作者
Cao, Linan [1 ]
Liu, Pei [1 ]
Chen, Jialong [1 ]
Deng, Lei [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China
来源
FRONTIERS IN ONCOLOGY | 2022年 / 12卷
基金
中国国家自然科学基金;
关键词
transcription factor binding sites; attention mechanism; positional embedding; deep learning; DNA; REPRESENTATION; SEQUENCES;
D O I
10.3389/fonc.2022.893520
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
In the process of regulating gene expression and evolution, such as DNA replication and mRNA transcription, the binding of transcription factors (TFs) to TF binding sites (TFBS) plays a vital role. Precisely modeling the specificity of genes and searching for TFBS are helpful to explore the mechanism of cell expression. In recent years, computational and deep learning methods searching for TFBS have become an active field of research. However, existing methods generally cannot meet high performance and interpretability simultaneously. Here, we develop an accurate and interpretable attention-based hybrid approach, DeepARC, that combines a convolutional neural network (CNN) and recurrent neural network (RNN) to predict TFBS. DeepARC employs a positional embedding method to extract the hidden embedding from DNA sequences, including the positional information from OneHot encoding and the distributed embedding from DNA2Vec. DeepARC feeds the positional embedding of the DNA sequence into a CNN-BiLSTM-Attention-based framework to complete the task of finding the motif. Taking advantage of the attention mechanism, DeepARC can gain greater access to valuable information about the motif and bring interpretability to the work of searching for motifs through the attention weight graph. Moreover, DeepARC achieves promising performances with an average area under the receiver operating characteristic curve (AUC) score of 0.908 on five cell lines (A549, GM12878, Hep-G2, H1-hESC, and Hela) in the benchmark dataset. We also compare the positional embedding with OneHot and DNA2Vec and gain a competitive advantage.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Genome-wide prediction of transcription factor binding sites using an integrated model
    Kyoung-Jae Won
    Bing Ren
    Wei Wang
    [J]. Genome Biology, 11
  • [32] Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture
    Wang, Siguo
    Zhang, Qinhu
    Shen, Zhen
    He, Ying
    Chen, Zhen-Heng
    Li, Jianqiang
    Huang, De-Shuang
    [J]. MOLECULAR THERAPY NUCLEIC ACIDS, 2021, 24 : 154 - 163
  • [33] VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees
    Grau, Jan
    Ben-Gal, Irad
    Posch, Stefan
    Grosse, Ivo
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W529 - W533
  • [34] Genome-wide prediction of transcription factor binding sites using an integrated model
    Won, Kyoung-Jae
    Ren, Bing
    Wang, Wei
    [J]. GENOME BIOLOGY, 2010, 11 (01):
  • [35] A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
    Hooghe, Bart
    Broos, Stefan
    van Roy, Frans
    De Bleser, Pieter
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (14)
  • [36] Simultaneous prediction of transcription factor binding sites in a group of prokaryotic genomes
    Zhang, Shaoqiang
    Li, Shan
    Pham, Phuc T.
    Su, Zhengchang
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [37] Simultaneous prediction of transcription factor binding sites in a group of prokaryotic genomes
    Shaoqiang Zhang
    Shan Li
    Phuc T Pham
    Zhengchang Su
    [J]. BMC Bioinformatics, 11
  • [38] Computational prediction of transcription factor binding sites based on an integrative approach incorporating genomic and epigenomic features
    Ho-Sik Seok
    Jaebum Kim
    [J]. Genes & Genomics, 2014, 36 : 25 - 30
  • [39] Computational prediction of transcription factor binding sites based on an integrative approach incorporating genomic and epigenomic features
    Seok, Ho-Sik
    Kim, Jaebum
    [J]. GENES & GENOMICS, 2014, 36 (01) : 25 - 30
  • [40] DeepSurf: a surface-based deep learning approach for the prediction of ligand binding sites on proteins
    Mylonas, Stelios K.
    Axenopoulos, Apostolos
    Daras, Petros
    [J]. BIOINFORMATICS, 2021, 37 (12) : 1681 - 1690