DNA sequence plus shape kernel enables alignment-free modeling of transcription factor binding

被引:19
|
作者
Ma, Wenxiu [1 ]
Yang, Lin [2 ,3 ,4 ,5 ]
Rohs, Remo [2 ,3 ,4 ,5 ]
Noble, William Stafford [6 ]
机构
[1] Univ Calif Riverside, Dept Stat, Riverside, CA 92521 USA
[2] Univ Southern Calif, Mol & Computat Biol Program, Dept Biol Sci, Los Angeles, CA 90089 USA
[3] Univ Southern Calif, Mol & Computat Biol Program, Dept Chem, Los Angeles, CA 90089 USA
[4] Univ Southern Calif, Mol & Computat Biol Program, Dept Phys, Los Angeles, CA 90089 USA
[5] Univ Southern Calif, Mol & Computat Biol Program, Dept Comp Sci, Los Angeles, CA 90089 USA
[6] Univ Washington, Dept Genome Sci, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
MOTIF ENVIRONMENT; IN-VIVO; SPECIFICITY; FEATURES; DETERMINANTS; PREDICTION; SELECTION;
D O I
10.1093/bioinformatics/btx336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. Results: We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values.
引用
收藏
页码:3003 / 3010
页数:8
相关论文
共 50 条
  • [31] Correction: Corrigendum: ChEC-seq kinetics discriminates transcription factor binding sites by DNA sequence and shape in vivo
    Gabriel E. Zentner
    Sivakanthan Kasinathan
    Beibei Xin
    Remo Rohs
    Steven Henikoff
    Nature Communications, 8
  • [32] Co-SELECT reveals sequence non-specific contribution of DNA shape to transcription factor binding in vitro
    Pal, Soumitra
    Hoinka, Jan
    Przytycka, Teresa M.
    NUCLEIC ACIDS RESEARCH, 2019, 47 (13) : 6632 - 6641
  • [33] MLDSP-GUI: an alignment-free standalone tool with an interactive graphical user interface for DNA sequence comparison and analysis
    Randhawa, Gurjit S.
    Hill, Kathleen A.
    Kari, Lila
    BIOINFORMATICS, 2020, 36 (07) : 2258 - 2259
  • [34] DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo
    Mathelier, Anthony
    Xin, Beibei
    Chiu, Tsu-Pei
    Yang, Lin
    Rohs, Remo
    Wasserman, Wyeth W.
    CELL SYSTEMS, 2016, 3 (03) : 278 - +
  • [35] TFBSshape: a motif database for DNA shape features of transcription factor binding sites
    Yang, Lin
    Dror, Iris
    Zhou, Tianyin
    Mathelier, Anthony
    Wasserman, Wyeth W.
    Gordan, Raluca
    Rohs, Remo
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2015, 33 : 9 - 9
  • [36] AliBiMotif: Integrating alignment and biclustering to unravel Transcription Factor Binding Sites in DNA sequences
    Goncalves, Joana P.
    Moreau, Yves
    Madeira, Sara C.
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2012, 6 (02) : 196 - 215
  • [37] TFBSshape: a motif database for DNA shape features of transcription factor binding sites
    Yang, Lin
    Zhou, Tianyin
    Dror, Iris
    Mathelier, Anthony
    Wasserman, Wyeth W.
    Gordan, Raluca
    Rohs, Remo
    NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D148 - D155
  • [38] Systematic Evaluation of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity
    Jin, Yutong
    Jiang, Jiahui
    Wang, Ruixuan
    Qin, Zhaohui S. S.
    FRONTIERS IN GENETICS, 2021, 12
  • [39] Publisher Correction: Inference of transcription factor binding from cell-free DNA enables tumor subtype prediction and early detection
    Peter Ulz
    Samantha Perakis
    Qing Zhou
    Tina Moser
    Jelena Belic
    Isaac Lazzeri
    Albert Wölfler
    Armin Zebisch
    Armin Gerger
    Gunda Pristauz
    Edgar Petru
    Brandon White
    Charles E. S. Roberts
    John St. John
    Michael G. Schimek
    Jochen B. Geigl
    Thomas Bauernhofer
    Heinz Sill
    Christoph Bock
    Ellen Heitzer
    Michael R. Speicher
    Nature Communications, 11
  • [40] Alignment-free Sequence Searching over Whole Genomes Using 3D Random plot of Query DNA Sequences
    Lee, Da-Young
    Tak, Hae-Sung
    Kim, Han-Ho
    Cho, Hwan-Gue
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2018, 42 (03): : 357 - 368