NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction

被引:1
|
作者
Yi, Ren [1 ]
Cho, Kyunghyun [1 ,2 ,3 ]
Bonneau, Richard [1 ,2 ,3 ,4 ]
机构
[1] NYU, Dept Comp Sci, New York, NY 10011 USA
[2] NYU, Ctr Data Sci, New York, NY 10011 USA
[3] Genentech accelerator, New York, NY 10010 USA
[4] NYU, Dept Biol, New York, NY 10003 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DNA; CHROMATIN; PROTEINS; ENHANCERS; INFERENCE;
D O I
10.1093/bioinformatics/btac569
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here, we propose NetTIME, a multitask learning framework for predicting cell-type-specific TF binding sites with base-pair resolution. Results: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with two state-of-the-art methods, Catchitt and Leopard, and show that our method outperforms previous methods under both supervised and transfer learning settings.
引用
收藏
页码:4762 / 4770
页数:9
相关论文
共 50 条
  • [21] SiTaR: a novel tool for transcription factor binding site prediction
    Fazius, Eugen
    Shelest, Vladimir
    Shelest, Ekaterina
    BIOINFORMATICS, 2011, 27 (20) : 2806 - 2811
  • [22] Computational prediction of transcription-factor binding site locations
    Bulyk, ML
    GENOME BIOLOGY, 2004, 5 (01):
  • [23] Site-specific fluorescent probing of RNA molecules by unnatural base-pair transcription for local structural conformation analysis
    Yasushi Hikida
    Michiko Kimoto
    Shigeyuki Yokoyama
    Ichiro Hirao
    Nature Protocols, 2010, 5 : 1312 - 1323
  • [24] Site-specific fluorescent probing of RNA molecules by unnatural base-pair transcription for local structural conformation analysis
    Hikida, Yasushi
    Kimoto, Michiko
    Yokoyama, Shigeyuki
    Hirao, Ichiro
    NATURE PROTOCOLS, 2010, 5 (07) : 1312 - 1323
  • [25] Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy
    Ahsan, Faizy
    Precup, Doina
    Blanchette, Mathieu
    PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2016, : 497 - 498
  • [26] TRANSCRIPTION FACTOR BINDING SITE PREDICTION WITH MULTIVARIATE GENE EXPRESSION DATA
    Zhang, Nancy R.
    Wildermuth, Mary C.
    Speed, Terence P.
    ANNALS OF APPLIED STATISTICS, 2008, 2 (01): : 332 - 365
  • [27] High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method
    Zhang, Yongqing
    Wang, Zixuan
    Zeng, Yuanqi
    Zhou, Jiliu
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [28] PURIFICATION OF A YEAST CENTROMERE-BINDING PROTEIN THAT IS ABLE TO DISTINGUISH SINGLE BASE-PAIR MUTATIONS IN ITS RECOGNITION SITE
    CAI, MJ
    DAVIS, RW
    MOLECULAR AND CELLULAR BIOLOGY, 1989, 9 (06) : 2544 - 2550
  • [29] Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction
    Salama, Rafik A.
    Stekel, Dov J.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (12) : e135
  • [30] Changes in dynamical behavior of the retinoid X receptor DNA-binding domain upon binding to a 14 base-pair DNA half site
    van Tilborg, PJA
    Czisch, M
    Mulder, FAA
    Folkers, GE
    Bonvin, AMJJ
    Nair, M
    Boelens, R
    Kaptein, R
    BIOCHEMISTRY, 2000, 39 (30) : 8747 - 8757