MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data

被引:0
|
作者
Roopra, Avtar [1 ]
机构
[1] Univ Wisconsin Madison, Dept Neurosci, 5507 WIMR, Madison, WI 53706 USA
关键词
CTCF; DEACETYLASE; EXPRESSION; REPRESSION; BINDING; GROWTH;
D O I
10.1371/journal.pcbi.1007800; 10.1371/journal.pcbi.1007800.r001; 10.1371/journal.pcbi.1007800.r002; 10.1371/journal.pcbi.1007800.r003; 10.1371/journal.pcbi.1007800.r004; 10.1371/journal.pcbi.1007800.r005; 10.1371/journal.pcbi.1007800.r006
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of 'target', or 'non-target' followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments. Author summary Key to the control of gene expression is the level of transcript in the cell. This level is controlled large part by Transcription factors (TFs) and cofactors. TFs are DNA binding proteins that recognize specific sequence elements to control levels of gene activity. TFs recruit cofactors that do not themselves bind DNA but are brought to promoters via TFs to either enhance or repress gene expression. TFs and cofactors are thus key regulators of transcript levels. It is now routine to obtain the expression levels of every gene transcript in the genome i.e. whole transcriptome data. Understanding how the transcriptome is controlled is challenging. Herein, a method is described that predicts which Factors organize and control sets of genes. The algorithm is termed Mining Algorithm for GenetIc Controllers (MAGIC). MAGIC uses data derived from ChIPseq tracks archived at ENCODE to decipher which Factors are most likely to preferentially bind lists of genes that are altered from one biological state to another. MAGIC circumvents the principal confounds of current methods to identify Factors and will aid in the discovery of organizing principles behind large scale gene changes seen in physiology and disease.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data
    Wu, Wei-Sheng
    Li, Wen-Hsiung
    Chen, Bor-Sen
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [42] Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data
    Wei-Sheng Wu
    Wen-Hsiung Li
    Bor-Sen Chen
    BMC Bioinformatics, 8
  • [43] Chicken ovalbumin upstream promoter transcription factors act as auxiliary cofactors for hepatocyte nuclear factor 4 and enhance hepatic gene expression
    Ktistaki, E
    Talianidis, I
    MOLECULAR AND CELLULAR BIOLOGY, 1997, 17 (05) : 2790 - 2797
  • [44] PlacentaCellEnrich: A tool to characterize gene sets using placenta cell-specific gene enrichment analysis
    Jain, Ashish
    Tuteja, Geetu
    PLACENTA, 2021, 103 : 164 - 171
  • [45] Phenotypic engineering by reprogramming gene transcription using novel artificial transcription factors in Escherichia coli
    Lee, Ju Young
    Sung, Bong Hyun
    Yu, Byung Jo
    Lee, Jun Hyoung
    Lee, Sang Hee
    Kim, Mi Sun
    Koob, Michael D.
    Kim, Sun Chang
    NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [46] A method for predicting essential proteins using gene expression data
    Patil, Soumya B.
    Sekhar, S. R. Mani
    Siddesh, G. M.
    Manvi, Sunilkumar S.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES FOR SMART NATION (SMARTTECHCON), 2017, : 1278 - 1281
  • [47] Gene sequence data sets analysed using a hierarchical neural clusterer
    Adams, R
    Davey, N
    Kaye, P
    Pensuwon, W
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2005, 36 (14) : 877 - 885
  • [48] In-Vehicle Data for Predicting Road Conditions and Driving Style Using Machine Learning
    Al-refai, Ghaith
    Elmoaqet, Hisham
    Ryalat, Mutaz
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [49] A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
    Canela-Xandri, Oriol
    Law, Andy
    Gray, Alan
    Woolliams, John A.
    Tenesa, Albert
    NATURE COMMUNICATIONS, 2015, 6
  • [50] A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
    Oriol Canela-Xandri
    Andy Law
    Alan Gray
    John A. Woolliams
    Albert Tenesa
    Nature Communications, 6