Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

被引:6
|
作者
Kahara, Juhani [1 ]
Lahdesmaki, Harri [1 ,2 ]
机构
[1] Aalto Univ, Sch Sci, Dept Informat & Comp Sci, FI-00076 Aalto, Finland
[2] Turku Univ, Turku Ctr Biotechnol, Turku, Finland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
芬兰科学院;
关键词
SIGNALS;
D O I
10.1186/1471-2105-14-S10-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
    Juhani Kähärä
    Harri Lähdesmäki
    BMC Bioinformatics, 14
  • [2] A benchmark study of k-mer counting methods for high-throughput sequencing
    Manekar, Swati C.
    Sathe, Shailesh R.
    GIGASCIENCE, 2018, 7 (12):
  • [3] ChIP-seq: Using high-throughput sequencing to discover protein-DNA interactions
    Schmidt, Dominic
    Wilson, Michael D.
    Spyrou, Christiana
    Brown, Gordon D.
    Hadfield, James
    Odom, Duncan T.
    METHODS, 2009, 48 (03) : 240 - 248
  • [4] Bayesian Analysis of High-Throughput Quantitative Measurement of Protein-DNA Interactions
    Pollock, David D.
    de Koning, A. P. Jason
    Kim, Hyunmin
    Castoe, Todd A.
    Churchill, Mair E. A.
    Kechris, Katerina J.
    PLOS ONE, 2011, 6 (11):
  • [5] High-throughput single-molecule studies of protein-DNA interactions
    Robison, Aaron D.
    Finkelstein, Ilya J.
    FEBS LETTERS, 2014, 588 (19) : 3539 - 3546
  • [6] High-throughput and multiplexed protein array technology: protein-DNA and protein-protein interactions
    Sakanyan, V
    JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2005, 815 (1-2): : 77 - 95
  • [7] High-throughput assay for determining specificity and affinity of protein-DNA binding interactions
    Outi Hallikas
    Jussi Taipale
    Nature Protocols, 2006, 1 : 215 - 222
  • [8] High-throughput assay for determining specificity and affinity of protein-DNA binding interactions
    Hallikas, Outi
    Taipale, Jussi
    NATURE PROTOCOLS, 2006, 1 (01) : 215 - 222
  • [9] Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
    Orenstein, Yaron
    Pellow, David
    Marcais, Guillaume
    Shamir, Ron
    Kingsford, Carl
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (10)
  • [10] High throughput assays for visualizing individual protein-DNA interactions
    Patel, Smita
    Pandey, Manjula
    Syed, Salman
    Ha, Taekjip
    Johnson, Daniel
    Wang, Michelle
    BIOPHYSICAL JOURNAL, 2009, 96 (03) : 193A - 193A