DNA sequence plus shape kernel enables alignment-free modeling of transcription factor binding

被引:19
|
作者
Ma, Wenxiu [1 ]
Yang, Lin [2 ,3 ,4 ,5 ]
Rohs, Remo [2 ,3 ,4 ,5 ]
Noble, William Stafford [6 ]
机构
[1] Univ Calif Riverside, Dept Stat, Riverside, CA 92521 USA
[2] Univ Southern Calif, Mol & Computat Biol Program, Dept Biol Sci, Los Angeles, CA 90089 USA
[3] Univ Southern Calif, Mol & Computat Biol Program, Dept Chem, Los Angeles, CA 90089 USA
[4] Univ Southern Calif, Mol & Computat Biol Program, Dept Phys, Los Angeles, CA 90089 USA
[5] Univ Southern Calif, Mol & Computat Biol Program, Dept Comp Sci, Los Angeles, CA 90089 USA
[6] Univ Washington, Dept Genome Sci, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
MOTIF ENVIRONMENT; IN-VIVO; SPECIFICITY; FEATURES; DETERMINANTS; PREDICTION; SELECTION;
D O I
10.1093/bioinformatics/btx336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. Results: We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values.
引用
收藏
页码:3003 / 3010
页数:8
相关论文
共 50 条
  • [1] A Novel Alignment-Free Method for Comparing Transcription Factor Binding Site Motifs
    Xu, Minli
    Su, Zhengchang
    PLOS ONE, 2010, 5 (01):
  • [2] Predicting in-vitro Transcription Factor Binding Sites Using DNA Sequence plus Shape
    Zhang, Qinhu
    Shen, Zhen
    Huang, De-Shuang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (02) : 667 - 676
  • [3] A fast, alignment-free, conservation-based method for transcription factor binding site discovery
    Gordan, Raluca
    Narlikar, Leelavati
    Hartemink, Alexander J.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2008, 4955 : 98 - +
  • [4] An improved alignment-free model for dna sequence similarity metric
    Junpeng Bao
    Ruiyu Yuan
    Zhe Bao
    BMC Bioinformatics, 15
  • [5] An improved alignment-free model for dna sequence similarity metric
    Bao, Junpeng
    Yuan, Ruiyu
    Bao, Zhe
    BMC BIOINFORMATICS, 2014, 15
  • [6] Transcription Factor Binding Probabilities in Orthologous Promoters: An Alignment-Free Approach to the Inference of Functional Regulatory Targets
    Liu, Xiao
    Clarke, Neil D.
    COMPARATIVE GENOMICS, PROCEEDINGS, 2009, 5817 : 229 - 240
  • [7] Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach
    Pilib Ó Broin
    Terry J Smith
    Aaron AJ Golden
    BMC Bioinformatics, 16
  • [8] Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach
    Broin, Pilib O.
    Smith, Terry J.
    Golden, Aaron A. J.
    BMC BIOINFORMATICS, 2015, 16
  • [9] DNA Sequence Correlations Shape Nonspecific Transcription Factor-DNA Binding Affinity
    Sela, Itamer
    Lukatsky, David B.
    BIOPHYSICAL JOURNAL, 2011, 101 (01) : 160 - 166
  • [10] Quantitative modeling of transcription factor binding specificities using DNA shape
    Zhou, Tianyin
    Shen, Ning
    Yang, Lin
    Abe, Namiko
    Horton, John
    Mann, Richard S.
    Bussemaker, Harmen J.
    Gordan, Raluca
    Rohs, Remo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (15) : 4654 - 4659