Theoretical and empirical quality assessment of transcription factor-binding motifs

被引:46
|
作者
Medina-Rivera, Alejandra [1 ,2 ]
Abreu-Goodger, Cei [3 ]
Thomas-Chollier, Morgane [4 ]
Salgado, Heladia [1 ]
Collado-Vides, Julio [1 ]
van Helden, Jacques [1 ,2 ]
机构
[1] Univ Nacl Autonoma Mexico, Ctr Ciencias Genom, Col Chamilpa 62210, Morelos, Mexico
[2] Univ Libre Bruxelles, Lab Bioinformat Genomes & Reseaux BiGRe, B-1050 Brussels, Belgium
[3] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[4] Max Planck Inst Mol Genet, Dept Computat Mol Biol, D-14195 Berlin, Germany
基金
美国国家卫生研究院;
关键词
OPEN-ACCESS DATABASE; SEQUENCE-ANALYSIS TOOLS; ESCHERICHIA-COLI K-12; REGULATORY ELEMENTS; DNA; PROFILES; GENOME; SITES; PROMOTERS; REGULONDB;
D O I
10.1093/nar/gkq710
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program 'matrix-quality', that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied 'matrix-quality' to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP-seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets.
引用
收藏
页码:808 / 824
页数:17
相关论文
共 50 条
  • [1] Associating transcription factor-binding site motifs with target GO terms and target genes
    Boden, Mikael
    Bailey, Timothy L.
    NUCLEIC ACIDS RESEARCH, 2008, 36 (12) : 4108 - 4117
  • [2] PHARMACOLOGICAL REGULATION OF TRANSCRIPTION FACTOR-BINDING
    PENNYPACKER, KR
    PHARMACOLOGY, 1995, 51 (01) : 1 - 12
  • [3] CG-containing oligonucleotides and transcription factor-binding motifs are enriched in human pericentric regions
    Wada, Yoshiko
    Iwasaki, Yuki
    Abe, Takashi
    Wada, Kennosuke
    Tooyama, Ikuo
    Ikemura, Toshimichi
    GENES & GENETIC SYSTEMS, 2015, 90 (01) : 43 - 53
  • [4] NUCLEOSOME DISRUPTION BY TRANSCRIPTION FACTOR-BINDING IN YEAST
    MORSE, RH
    SCIENCE, 1993, 262 (5139) : 1563 - 1566
  • [5] EFFECT OF HISTONE ACETYLATION ON TRANSCRIPTION FACTOR-BINDING
    YODH, JG
    IMBALZANO, AN
    KINGSTON, RE
    JOURNAL OF CELLULAR BIOCHEMISTRY, 1995, : 163 - 163
  • [6] Bayesian clustering of transcription factor binding motifs
    Jensen, Shane T.
    Liu, Jun S.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (481) : 188 - 200
  • [7] Variable structure motifs for transcription factor binding sites
    John E Reid
    Kenneth J Evans
    Nigel Dyer
    Lorenz Wernisch
    Sascha Ott
    BMC Genomics, 11
  • [8] Variable structure motifs for transcription factor binding sites
    Reid, John E.
    Evans, Kenneth J.
    Dyer, Nigel
    Wernisch, Lorenz
    Ott, Sascha
    BMC GENOMICS, 2010, 11
  • [9] Varying levels of complexity in transcription factor binding motifs
    Keilwagen, Jens
    Grau, Jan
    NUCLEIC ACIDS RESEARCH, 2015, 43 (18)
  • [10] DNA replication efficiency depends on transcription factor-binding sites
    Turner, WJ
    Woodworth, ME
    JOURNAL OF VIROLOGY, 2001, 75 (12) : 5638 - 5645