MAGIC: A tool for predicting transcription factors and cofactors driving gene sets using ENCODE data

被引:0
|
作者
Roopra, Avtar [1 ]
机构
[1] Univ Wisconsin Madison, Dept Neurosci, 5507 WIMR, Madison, WI 53706 USA
关键词
CTCF; DEACETYLASE; EXPRESSION; REPRESSION; BINDING; GROWTH;
D O I
10.1371/journal.pcbi.1007800; 10.1371/journal.pcbi.1007800.r001; 10.1371/journal.pcbi.1007800.r002; 10.1371/journal.pcbi.1007800.r003; 10.1371/journal.pcbi.1007800.r004; 10.1371/journal.pcbi.1007800.r005; 10.1371/journal.pcbi.1007800.r006
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcriptomic profiling is an immensely powerful hypothesis generating tool. However, accurately predicting the transcription factors (TFs) and cofactors that drive transcriptomic differences between samples is challenging. A number of algorithms draw on ChIP-seq tracks to define TFs and cofactors behind gene changes. These approaches assign TFs and cofactors to genes via a binary designation of 'target', or 'non-target' followed by Fisher Exact Tests to assess enrichment of TFs and cofactors. ENCODE archives 2314 ChIP-seq tracks of 684 TFs and cofactors assayed across a 117 human cell lines under a multitude of growth and maintenance conditions. The algorithm presented herein, Mining Algorithm for GenetIc Controllers (MAGIC), uses ENCODE ChIP-seq data to look for statistical enrichment of TFs and cofactors in gene bodies and flanking regions in gene lists without an a priori binary classification of genes as targets or non-targets. When compared to other TF mining resources, MAGIC displayed favourable performance in predicting TFs and cofactors that drive gene changes in 4 settings: 1) A cell line expressing or lacking single TF, 2) Breast tumors divided along PAM50 designations 3) Whole brain samples from WT mice or mice lacking a single TF in a particular neuronal subtype 4) Single cell RNAseq analysis of neurons divided by Immediate Early Gene expression levels. In summary, MAGIC is a standalone application that produces meaningful predictions of TFs and cofactors in transcriptomic experiments. Author summary Key to the control of gene expression is the level of transcript in the cell. This level is controlled large part by Transcription factors (TFs) and cofactors. TFs are DNA binding proteins that recognize specific sequence elements to control levels of gene activity. TFs recruit cofactors that do not themselves bind DNA but are brought to promoters via TFs to either enhance or repress gene expression. TFs and cofactors are thus key regulators of transcript levels. It is now routine to obtain the expression levels of every gene transcript in the genome i.e. whole transcriptome data. Understanding how the transcriptome is controlled is challenging. Herein, a method is described that predicts which Factors organize and control sets of genes. The algorithm is termed Mining Algorithm for GenetIc Controllers (MAGIC). MAGIC uses data derived from ChIPseq tracks archived at ENCODE to decipher which Factors are most likely to preferentially bind lists of genes that are altered from one biological state to another. MAGIC circumvents the principal confounds of current methods to identify Factors and will aid in the discovery of organizing principles behind large scale gene changes seen in physiology and disease.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Predicting master transcription factors from pan-cancer expression data
    Reddy, Jessica
    Fonseca, Marcos A. S.
    Corona, Rosario, I
    Nameki, Robbin
    Dezem, Felipe Segato
    Klein, Isaac A.
    Chang, Heidi
    Chaves-Moreira, Daniele
    Afeyan, Lena K.
    Malta, Tathiane M.
    Lin, Xianzhi
    Abbasi, Forough
    Font-Tello, Alba
    Sabedot, Thais
    Cejas, Paloma
    Rodriguez-Malave, Norma
    Seo, Ji-Heui
    Lin, De-Chen
    Matulonis, Ursula
    Karlan, Beth Y.
    Gayther, Simon A.
    Pasaniuc, Bogdan
    Gusev, Alexander
    Noushmehr, Houtan
    Long, Henry
    Freedman, Matthew L.
    Drapkin, Ronny
    Young, Richard A.
    Abraham, Brian J.
    Lawrenson, Kate
    SCIENCE ADVANCES, 2021, 7 (48):
  • [22] Predicting Hazardous Events in Work Zones Using Naturalistic Driving Data
    Chang, Yohan
    Edara, Praveen
    2017 IEEE 20TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2017,
  • [23] Predicting hyperosmolality-inducible transcription factors using MEME tools
    Kim, Chanhee
    Haworth, Lorna
    Fu, Yuhan
    Kultz, Dietmar
    FASEB JOURNAL, 2021, 35
  • [24] Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information
    Liu, Meng-Lu
    Su, Wei
    Wang, Jia-Shu
    Yang, Yu-He
    Yang, Hui
    Lin, Hao
    MOLECULAR THERAPY NUCLEIC ACIDS, 2020, 22 : 1043 - 1050
  • [25] Predicting hyperosmolality-inducible transcription factors using MEME tools
    Kim, C.
    Kultz, D.
    INTEGRATIVE AND COMPARATIVE BIOLOGY, 2023, 62 : S169 - S169
  • [26] Predicting proteome dynamics using gene expression data
    Kuchta, Krzysztof
    Towpik, Joanna
    Biernacka, Anna
    Kutner, Jan
    Kudlicki, Andrzej
    Ginalski, Krzysztof
    Rowicka, Maga
    SCIENTIFIC REPORTS, 2018, 8
  • [27] Predicting gene dosage using genomic sequence data
    Barker, Jocelyn Elaine
    Sherlock, Gavin
    Hartman, James
    Morgan, William
    FASEB JOURNAL, 2008, 22
  • [28] Predicting proteome dynamics using gene expression data
    Krzysztof Kuchta
    Joanna Towpik
    Anna Biernacka
    Jan Kutner
    Andrzej Kudlicki
    Krzysztof Ginalski
    Maga Rowicka
    Scientific Reports, 8
  • [29] Predicting and deciphering preventive gene sets using gene expression data and protein encoded sequences of Genistein treated PC-3 cells
    Laslo, R
    Rowland, I
    Klocker, H
    Hancok, RL
    Pardini, RS
    Baba, AI
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2005, 14 (11) : 2698S - 2698S
  • [30] Gene function analysis in complex data sets using ErmineJ
    Jesse Gillis
    Meeta Mistry
    Paul Pavlidis
    Nature Protocols, 2010, 5 : 1148 - 1159