Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

被引:6
|
作者
Kahara, Juhani [1 ]
Lahdesmaki, Harri [1 ,2 ]
机构
[1] Aalto Univ, Sch Sci, Dept Informat & Comp Sci, FI-00076 Aalto, Finland
[2] Turku Univ, Turku Ctr Biotechnol, Turku, Finland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
芬兰科学院;
关键词
SIGNALS;
D O I
10.1186/1471-2105-14-S10-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Engineering High Affinity Protein-Protein Interactions Using a High-Throughput Microcapillary Array Platform
    Lim, Sungwon
    Chen, Bob
    Kariolis, Mihalis S.
    Dimov, Ivan K.
    Baer, Thomas M.
    Cochran, Jennifer R.
    ACS CHEMICAL BIOLOGY, 2017, 12 (02) : 336 - 341
  • [42] High-throughput protein characterization by complementation using DNA barcoded fragment libraries
    Bradley W Biggs
    Morgan N Price
    Dexter Lai
    Jasmine Escobedo
    Yuridia Fortanel
    Yolanda Y Huang
    Kyoungmin Kim
    Valentine V Trotter
    Jennifer V Kuehl
    Lauren M Lui
    Romy Chakraborty
    Adam M Deutschbauer
    Adam P Arkin
    Molecular Systems Biology, 2024, 20 (11) : 1207 - 1229
  • [43] High-throughput screening for protein-protein interactions using two-hybrid assay
    Cagney, G
    Uetz, P
    Fields, S
    APPLICATIONS OF CHIMERIC GENES AND HYBRID PROTEINS, PT C, 2000, 328 : 3 - 14
  • [44] Inferring protein-protein interactions through high-throughput interaction data from diverse organisms
    Liu, Y
    Liu, NJ
    Zhao, HY
    BIOINFORMATICS, 2005, 21 (15) : 3279 - 3285
  • [45] Selection of DNA aptamers for ovarian cancer biomarker HE4 using CE-SELEX and high-throughput sequencing
    Rachel M. Eaton
    Jamie A. Shallcross
    Liora E. Mael
    Kepler S. Mears
    Lisa Minkoff
    Delia J. Scoville
    Rebecca J. Whelan
    Analytical and Bioanalytical Chemistry, 2015, 407 : 6965 - 6973
  • [46] Selection of DNA aptamers for ovarian cancer biomarker HE4 using CE-SELEX and high-throughput sequencing
    Eaton, Rachel M.
    Shallcross, Jamie A.
    Mael, Liora E.
    Mears, Kepler S.
    Minkoff, Lisa
    Scoville, Delia J.
    Whelan, Rebecca J.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2015, 407 (23) : 6965 - 6973
  • [47] Uncovering domain motif interactions using high-throughput protein-protein interaction detection methods
    Idrees, Sobia
    Paudel, Keshav Raj
    Sadaf, Tayyaba
    Hansbro, Philip M.
    FEBS LETTERS, 2024, 598 (07) : 725 - 742
  • [48] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Patil, A
    Nakamura, H
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [49] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Ashwini Patil
    Haruki Nakamura
    BMC Bioinformatics, 6
  • [50] SNP calling using genotype model selection on high-throughput sequencing data
    You, Na
    Murillo, Gabriel
    Su, Xiaoquan
    Zeng, Xiaowei
    Xu, Jian
    Ning, Kang
    Zhang, Shoudong
    Zhu, Jiankang
    Cui, Xinping
    BIOINFORMATICS, 2012, 28 (05) : 643 - 650