MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data

被引:0
|
作者
Roopra, Avtar [1 ]
机构
[1] Univ Wisconsin Madison, Dept Neurosci, 5507 WIMR, Madison, WI 53706 USA
关键词
CTCF; DEACETYLASE; EXPRESSION; REPRESSION; BINDING; GROWTH;
D O I
10.1371/journal.pcbi.1007800; 10.1371/journal.pcbi.1007800.r001; 10.1371/journal.pcbi.1007800.r002; 10.1371/journal.pcbi.1007800.r003; 10.1371/journal.pcbi.1007800.r004; 10.1371/journal.pcbi.1007800.r005; 10.1371/journal.pcbi.1007800.r006
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of 'target', or 'non-target' followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments. Author summary Key to the control of gene expression is the level of transcript in the cell. This level is controlled large part by Transcription factors (TFs) and cofactors. TFs are DNA binding proteins that recognize specific sequence elements to control levels of gene activity. TFs recruit cofactors that do not themselves bind DNA but are brought to promoters via TFs to either enhance or repress gene expression. TFs and cofactors are thus key regulators of transcript levels. It is now routine to obtain the expression levels of every gene transcript in the genome i.e. whole transcriptome data. Understanding how the transcriptome is controlled is challenging. Herein, a method is described that predicts which Factors organize and control sets of genes. The algorithm is termed Mining Algorithm for GenetIc Controllers (MAGIC). MAGIC uses data derived from ChIPseq tracks archived at ENCODE to decipher which Factors are most likely to preferentially bind lists of genes that are altered from one biological state to another. MAGIC circumvents the principal confounds of current methods to identify Factors and will aid in the discovery of organizing principles behind large scale gene changes seen in physiology and disease.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Shared transcription factors help encode the timing of gene activation
    Chellamuthu, P.
    Jackson, S.
    Boedicker, J.
    MOLECULAR BIOLOGY OF THE CELL, 2015, 26
  • [2] cTAP: A Machine Learning Framework for Predicting Target Genes of a Transcription Factor using a Cohort of Gene Expression Data Sets
    Wang, Honglin
    Joshi, Pujan
    Hong, Seung-Hyun
    Maye, Peter F.
    Rowe, David W.
    Shin, Dong-Guk
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 164 - 167
  • [3] Relating genes to function: identifying enriched transcription factors using the ENCODE ChIP-Seq significance tool
    Auerbach, Raymond K.
    Chen, Bin
    Butte, Atul J.
    BIOINFORMATICS, 2013, 29 (15) : 1922 - 1924
  • [4] Shared Transcription Factors Help Encode the Timing of Gene Activation Prithiviraj Chellamuthu
    Chellamuthu, Prithiviraj
    Jackson, Shane
    Boedicker, James
    BIOPHYSICAL JOURNAL, 2016, 110 (03) : 316A - 316A
  • [5] Identifying Stress Transcription Factors Using Gene Expression and TF-Gene Association Data
    Wu, Wei-Sheng
    Chen, Bor-Sen
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2007, 1 : 137 - 145
  • [6] BART: a transcription factor prediction tool with query gene sets or epigenomic profiles
    Wang, Zhenjia
    Civelek, Mete
    Miller, Clint L.
    Sheffield, Nathan C.
    Guertin, Michael J.
    Zang, Chongzhi
    BIOINFORMATICS, 2018, 34 (16) : 2867 - 2869
  • [7] ErmineJ: Tool for functional analysis of gene expression data sets
    Lee, HK
    Braynen, W
    Keshav, K
    Pavlidis, P
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [8] ErmineJ: Tool for functional analysis of gene expression data sets
    Homin K Lee
    William Braynen
    Kiran Keshav
    Paul Pavlidis
    BMC Bioinformatics, 6
  • [9] Process for the Validation of Using Synthetic Driving Cycles Based on Naturalistic Driving Data Sets
    Esser, Arved
    Rinderknecht, Stephan
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [10] Tool for storm analysis using multiple data sets
    Rabin, RM
    Whittaker, T
    ADVANCES IN VISUAL COMPUTING, PROCEEDINGS, 2005, 3804 : 571 - 578