Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

被引:70
|
作者
Schmidt, Florian [1 ,2 ]
Gasparoni, Nina [3 ]
Gasparoni, Gilles [3 ]
Gianmoena, Kathrin [4 ]
Cadenas, Cristina [4 ]
Polansky, Julia K. [5 ]
Ebert, Peter [2 ,6 ]
Nordstroem, Karl [3 ]
Barann, Matthias [7 ]
Sinha, Anupam [7 ]
Froehler, Sebastian [8 ]
Xiong, Jieyi [8 ]
Amirabad, Azim Dehghani [1 ,2 ,6 ]
Ardakani, Fatemeh Behjati [1 ,2 ]
Hutter, Barbara [9 ]
Zipprich, Gideon
Felder, Baerbel [10 ]
Eils, Juergen [10 ]
Brors, Benedikt [9 ]
Chen, Wei [8 ]
Hengstler, Jan G. [4 ]
Hamann, Alf [6 ]
Lengauer, Thomas [2 ]
Rosenstiel, Philip [7 ]
Walter, Joern [3 ]
Schulz, Marcel H. [1 ,2 ]
机构
[1] Cluster Excellence Multimodal Comp & Interact, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[2] Max Planck Inst Informat, Computat Biol & Appl Algorithm, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[3] Univ Saarland, Dept Genet, D-66123 Saarbrucken, Germany
[4] Leibniz Res Ctr Working Environm & Human Factors, D-44139 Dortmund, Germany
[5] German Rheumatism Res Ctr, Expt Rheumatol, D-10117 Berlin, Germany
[6] Int Max Planck Res Sch Comp Sci, Saarland Informat Campus, D-66123 Saarbrucken, Germany
[7] Univ Kiel, Inst Clin Mol Biol, D-24105 Kiel, Germany
[8] Max Delbruck Ctr Mol Med, Berlin Inst Med Syst Biol, D-13092 Berlin, Germany
[9] Deutsch Krebsforschungszentrum, Appl Bioinformat, D-69120 Heidelberg, Germany
[10] Deutsch Krebsforschungszentrum, Data Management & Genom IT, D-69120 Heidelberg, Germany
关键词
CHIP-SEQ DATA; COEXPRESSION NETWORK ANALYSIS; DNA; SITES; GENOME; INTEGRATION; HYPERSENSITIVITY; REGULARIZATION; FOOTPRINTS; EXPANSION;
D O I
10.1093/nar/gkw1061
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively.
引用
收藏
页码:54 / 66
页数:13
相关论文
共 50 条
  • [1] TRANSCRIPTION FACTOR BINDING SITE PREDICTION WITH MULTIVARIATE GENE EXPRESSION DATA
    Zhang, Nancy R.
    Wildermuth, Mary C.
    Speed, Terence P.
    ANNALS OF APPLIED STATISTICS, 2008, 2 (01): : 332 - 365
  • [3] Measuring the impact of chromatin context on transcription factor binding affinities
    Lindeboom, Rik G. H.
    Neikes, Hannah K.
    NATURE BIOTECHNOLOGY, 2023, 41 (12) : 1696 - 1697
  • [4] Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data
    Pique-Regi, Roger
    Degner, Jacob F.
    Pai, Athma A.
    Gaffney, Daniel J.
    Gilad, Yoav
    Pritchard, Jonathan K.
    GENOME RESEARCH, 2011, 21 (03) : 447 - 455
  • [5] Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data
    Zamanighomi, Mahdi
    Lin, Zhixiang
    Wang, Yong
    Jiang, Rui
    Wong, Wing Hung
    NUCLEIC ACIDS RESEARCH, 2017, 45 (10) : 5666 - 5677
  • [6] Disease-gene discovery by integration of 3D gene expression and transcription factor binding affinities
    Piro, Rosario M.
    Molineris, Ivan
    Di Cunto, Ferdinando
    Eils, Roland
    Koenig, Rainer
    BIOINFORMATICS, 2013, 29 (04) : 468 - 475
  • [7] Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo
    Guertin, Michael J.
    Martins, Andre L.
    Siepel, Adam
    Lis, John T.
    PLOS GENETICS, 2012, 8 (03):
  • [8] Chromatin Signature and Transcription Factor Binding Provide a Predictive Basis for Understanding Plant Gene Expression
    Wu, Zefeng
    Tang, Jing
    Zhuo, Junjie
    Tian, Yuhan
    Zhao, Feiyang
    Li, Zhaohong
    Yan, Yubin
    Yang, Ruolin
    PLANT AND CELL PHYSIOLOGY, 2019, 60 (07) : 1471 - 1486
  • [9] Discovering transcription factor regulatory targets using gene expression and binding data
    Maienschein-Cline, Mark
    Zhou, Jie
    White, Kevin P.
    Sciammas, Roger
    Dinner, Aaron R.
    BIOINFORMATICS, 2012, 28 (02) : 206 - 213
  • [10] High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites
    Whitington, Tom
    Perkins, Andrew C.
    Bailey, Timothy L.
    NUCLEIC ACIDS RESEARCH, 2009, 37 (01) : 14 - 25