DNA sequence plus shape kernel enables alignment-free modeling of transcription factor binding

被引:19
|
作者
Ma, Wenxiu [1 ]
Yang, Lin [2 ,3 ,4 ,5 ]
Rohs, Remo [2 ,3 ,4 ,5 ]
Noble, William Stafford [6 ]
机构
[1] Univ Calif Riverside, Dept Stat, Riverside, CA 92521 USA
[2] Univ Southern Calif, Mol & Computat Biol Program, Dept Biol Sci, Los Angeles, CA 90089 USA
[3] Univ Southern Calif, Mol & Computat Biol Program, Dept Chem, Los Angeles, CA 90089 USA
[4] Univ Southern Calif, Mol & Computat Biol Program, Dept Phys, Los Angeles, CA 90089 USA
[5] Univ Southern Calif, Mol & Computat Biol Program, Dept Comp Sci, Los Angeles, CA 90089 USA
[6] Univ Washington, Dept Genome Sci, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
MOTIF ENVIRONMENT; IN-VIVO; SPECIFICITY; FEATURES; DETERMINANTS; PREDICTION; SELECTION;
D O I
10.1093/bioinformatics/btx336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. Results: We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values.
引用
收藏
页码:3003 / 3010
页数:8
相关论文
共 50 条
  • [41] Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection
    Yardimci, Galip Guerkan
    Frank, Christopher L.
    Crawford, Gregory E.
    Ohler, Uwe
    NUCLEIC ACIDS RESEARCH, 2014, 42 (19) : 11865 - 11878
  • [42] Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana
    Sielemann, Janik
    Wulf, Donat
    Schmidt, Romy
    Braeutigam, Andrea
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [43] TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites
    Chiu, Tsu-Pei
    Xin, Beibei
    Markarian, Nicholas
    Wang, Yingfei
    Rohs, Remo
    NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D246 - D255
  • [44] Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana
    Janik Sielemann
    Donat Wulf
    Romy Schmidt
    Andrea Bräutigam
    Nature Communications, 12
  • [45] LIMITED AND CONSERVED DNA-SEQUENCE REQUIREMENTS FOR MAMMALIAN MITOCHONDRIAL TRANSCRIPTION FACTOR BINDING
    GHIVIZZANI, SC
    HEHMAN, GL
    MADSEN, CS
    HAUSWIRTH, WW
    FASEB JOURNAL, 1991, 5 (04): : A816 - A816
  • [46] Transcription Factor Binding in Embryonic Stem Cells Is Constrained by DNA Sequence Repeat Symmetry
    Goldshtein, Matan
    Mellul, Meir
    Deutch, Gai
    Imashimizu, Masahiko
    Takeuchi, Koh
    Meshorer, Eran
    Ram, Oren
    Lukatsky, David B.
    BIOPHYSICAL JOURNAL, 2020, 118 (08) : 2015 - 2026
  • [47] Role of promoter DNA sequence variations on the binding of EGR1 transcription factor
    Mikles, David C.
    Schuchardt, Brett J.
    Bhat, Vikas
    McDonald, Caleb B.
    Farooq, Amjad
    ARCHIVES OF BIOCHEMISTRY AND BIOPHYSICS, 2014, 549 : 1 - 11
  • [48] FLEXIBLE STATISTICAL MODELLING OF THE OCCURRENCES OF TRANSCRIPTION FACTOR BINDING SITES ALONG A DNA SEQUENCE
    Kallah-Dagadu, G.
    Nkansah, B. K.
    Howard, N. K.
    ADVANCES AND APPLICATIONS IN STATISTICS, 2018, 53 (06) : 659 - 691
  • [49] Characterization of sequence-specific DNA binding by the transcription factor Oct-1
    Lundbäck, T
    Chang, JF
    Phillips, K
    Luisi, B
    Ladbury, JE
    BIOCHEMISTRY, 2000, 39 (25) : 7570 - 7579
  • [50] Sequence features of DNA binding sites reveal structural class of associated transcription factor
    Narlikar, L
    Hartemink, AJ
    BIOINFORMATICS, 2006, 22 (02) : 157 - 163