NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction

被引:1
|
作者
Yi, Ren [1 ]
Cho, Kyunghyun [1 ,2 ,3 ]
Bonneau, Richard [1 ,2 ,3 ,4 ]
机构
[1] NYU, Dept Comp Sci, New York, NY 10011 USA
[2] NYU, Ctr Data Sci, New York, NY 10011 USA
[3] Genentech accelerator, New York, NY 10010 USA
[4] NYU, Dept Biol, New York, NY 10003 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DNA; CHROMATIN; PROTEINS; ENHANCERS; INFERENCE;
D O I
10.1093/bioinformatics/btac569
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here, we propose NetTIME, a multitask learning framework for predicting cell-type-specific TF binding sites with base-pair resolution. Results: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with two state-of-the-art methods, Catchitt and Leopard, and show that our method outperforms previous methods under both supervised and transfer learning settings.
引用
收藏
页码:4762 / 4770
页数:9
相关论文
共 50 条
  • [1] Base-pair resolution detection of transcription factor binding site by deep deconvolutional network
    Salekin, Sirajul
    Zhang, Jianqiu
    Huang, Yufei
    BIOINFORMATICS, 2018, 34 (20) : 3446 - 3453
  • [2] Base-resolution prediction of transcription factor binding signals by a deep learning framework
    Zhang, Qinhu
    He, Ying
    Wang, Siguo
    Chen, Zhanheng
    Guo, Zhenhao
    Cui, Zhen
    Liu, Qi
    Huang, De-Shuang
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (03)
  • [3] Developmental Regulation of Human Cortex Transcription at Base-pair Resolution
    Jaffe, Andrew
    Shin, Jooheon
    Collado-Torres, Leonardo
    Leek, Jeffrey
    Tao, Ran
    Li, Chao
    Gao, Yuan
    Jia, Yankai
    Maher, Brady
    Hyde, Thomas
    Kleinman, Joel
    Weinberger, Daniel
    NEUROPSYCHOPHARMACOLOGY, 2014, 39 : S172 - S173
  • [4] preciseTAD: a transfer learning framework for 3D domain boundary prediction at base-pair resolution
    Stilianoudakis, Spiro C.
    Marshall, Maggie A.
    Dozmorov, Mikhail G.
    BIOINFORMATICS, 2022, 38 (03) : 621 - 630
  • [5] A novel method for improved accuracy of transcription factor binding site prediction
    Khamis, Abdullah M.
    Motwalli, Olaa
    Oliva, Romina
    Jankovic, Boris R.
    Medvedeva, Yulia A.
    Ashoor, Haitham
    Essack, Magbubah
    Gao, Xin
    Bajic, Vladimir B.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (12)
  • [6] Binding site graphs: A new graph theoretical framework for prediction of transcription factor binding sites
    Reddy, Timothy E.
    DeLisi, Charles
    Shakhnovich, Boris E.
    PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (05) : 844 - 854
  • [7] Probabilistic framework for transcription factor binding prediction
    Laehdesmaeki, Harri
    Shmulevich, Ilya
    2007 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2007, : 95 - 98
  • [8] Base-pair resolution mapping of nucleosome positions using site-directed hydroxy radicals
    Flaus, A
    Richmond, TJ
    CHROMATIN, 1999, 304 : 251 - 263
  • [9] Scoring functions for transcription factor binding site prediction
    Markus Friberg
    Peter von Rohr
    Gaston Gonnet
    BMC Bioinformatics, 6
  • [10] A web server for transcription factor binding site prediction
    Su, Gang
    Mao, Binchen
    Wang, Jin
    BIOINFORMATION, 2006, 1 (05) : 156 - 157