Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

被引:6
|
作者
Kahara, Juhani [1 ]
Lahdesmaki, Harri [1 ,2 ]
机构
[1] Aalto Univ, Sch Sci, Dept Informat & Comp Sci, FI-00076 Aalto, Finland
[2] Turku Univ, Turku Ctr Biotechnol, Turku, Finland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
芬兰科学院;
关键词
SIGNALS;
D O I
10.1186/1471-2105-14-S10-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Discovering reliable protein interactions from high-throughput experimental data using network topology
    Chen, J
    Hsu, W
    Lee, ML
    Ng, SK
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 35 (1-2) : 37 - 47
  • [32] Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data
    You, Zhu-Hong
    Lei, Ying-Ke
    Gui, Jie
    Huang, De-Shuang
    Zhou, Xiaobo
    BIOINFORMATICS, 2010, 26 (21) : 2744 - 2751
  • [33] High-throughput prediction of protein antigenicity using protein microarray data
    Magnan, Christophe N.
    Zeller, Michael
    Kayala, Matthew A.
    Vigil, Adam
    Randall, Arlo
    Felgner, Philip L.
    Baldi, Pierre
    BIOINFORMATICS, 2010, 26 (23) : 2936 - 2943
  • [34] High-throughput methods for identification of protein-protein interactions involving short linear motifs
    Cecilia Blikstad
    Ylva Ivarsson
    Cell Communication and Signaling, 13
  • [35] High-throughput methods for identification of protein-protein interactions involving short linear motifs
    Blikstad, Cecilia
    Ivarsson, Ylva
    CELL COMMUNICATION AND SIGNALING, 2015, 13
  • [36] Evaluating a High-throughput method of Shearing DNA Using a Bead Mill for ChIP
    Easparro, Brandon
    FASEB JOURNAL, 2020, 34
  • [37] High-throughput, high-force probing of DNA-protein interactions with magnetic tweezers
    Berghuis, Bojk A.
    Kober, Mariana
    van Laar, Theo
    Dekker, Nynke H.
    METHODS, 2016, 105 : 90 - 98
  • [38] Precise physical models of protein - DNA interaction from high-throughput data
    Kinney, Justin B.
    Tkacik, Gasper
    Callan, Curtis G., Jr.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (02) : 501 - 506
  • [39] Proteome-wide prediction of protein-protein interactions from high-throughput data
    Liu, Zhi-Ping
    Chen, Luonan
    PROTEIN & CELL, 2012, 3 (07) : 508 - 520
  • [40] Proteome-wide prediction of protein-protein interactions from high-throughput data
    ZhiPing Liu
    Luonan Chen
    Protein & Cell, 2012, 3 (07) : 508 - 520