Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

被引:6
|
作者
Kahara, Juhani [1 ]
Lahdesmaki, Harri [1 ,2 ]
机构
[1] Aalto Univ, Sch Sci, Dept Informat & Comp Sci, FI-00076 Aalto, Finland
[2] Turku Univ, Turku Ctr Biotechnol, Turku, Finland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
芬兰科学院;
关键词
SIGNALS;
D O I
10.1186/1471-2105-14-S10-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Similarity analysis of protein sequences using a reduced k-mer amino acid model
    Wen, Jia
    Zhang, Yuyan
    Wang, Huanxu
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2020, 20 (01) : 45 - 60
  • [22] Real-time, high throughput screening of protein-DNA interactions.
    Zhang, ZR
    Hughes, MD
    Hine, AV
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 224 : U239 - U239
  • [23] Soft DNA Curtains: high-throughput SM biophysics method to investigate protein-DNA interaction
    Kopustas, Aurimas
    Rakickas, Tomas
    Paksaite, Juste
    Poceviciute, Ernesta
    Karvelis, Tautvydas
    Zaremba, Mindaugas
    Manakova, Elena
    Tutkus, Marijonas
    EUROPEAN BIOPHYSICS JOURNAL WITH BIOPHYSICS LETTERS, 2021, 50 (SUPPL 1): : 190 - 190
  • [24] The effect of prior assumptions over the weights in BayesPI with application to study protein-DNA interactions from ChIP-based high-throughput data
    Junbai Wang
    BMC Bioinformatics, 11
  • [25] The effect of prior assumptions over the weights in BayesPI with application to study protein-DNA interactions from ChIP-based high-throughput data
    Wang, Junbai
    BMC BIOINFORMATICS, 2010, 11
  • [26] High-throughput prediction of minor groove electrostatic potential in studies of protein-DNA recognition
    Rohs, Remo
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [27] High-throughput prediction of minor groove electrostatic potential in studies of protein-DNA recognition
    Chiu, Tsu-Pei
    Rohs, Remo
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [28] DNB-based on-chip motif finding: A high-throughput method to profile different types of protein-DNA interactions
    Li, Zhuokun
    Wang, Xiaojue
    Xu, Dongyang
    Zhang, Dengwei
    Wang, Dan
    Dai, Xuechen
    Wang, Qi
    Li, Zhou
    Gu, Ying
    Ouyang, Wenjie
    Zhao, Shuchang
    Huang, Baoqian
    Gong, Jian
    Zhao, Jing
    Chen, Ao
    Shen, Yue
    Dong, Yuliang
    Zhang, Wenwei
    Xu, Xun
    Xu, Chongjun
    Jiang, Yuan
    SCIENCE ADVANCES, 2020, 6 (31)
  • [29] Microspotting streptavidin and double-stranded DNA Arrays on gold for high-throughput studies of protein-DNA interactions by surface plasmon resonance microscopy
    Shumaker-Parry, JS
    Zareie, MH
    Aebersold, R
    Campbell, CT
    ANALYTICAL CHEMISTRY, 2004, 76 (04) : 918 - 929
  • [30] Systematic assessment of high-throughput experimental data for reliable protein interactions using network topology
    Chen, J
    Hsu, W
    Lee, ML
    Ng, SK
    ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 368 - 372