ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

被引:24
|
作者
Heller, David [1 ,2 ]
Krestel, Ralf [2 ]
Ohler, Uwe [3 ]
Vingron, Martin [1 ]
Marsico, Annalisa [1 ,4 ]
机构
[1] Max Planck Inst Mol Genet, Ihnestr 63-73, D-14195 Berlin, Germany
[2] Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
[3] Max Delbruck Ctr, Robert Roessle Str 10, D-13029 Berlin, Germany
[4] Free Univ Berlin, Arnimallee 14, D-14195 Berlin, Germany
关键词
GENE REGULATORY ELEMENTS; SECONDARY STRUCTURE; DNA; DISCOVERY; SITES; CLIP; MICROPROCESSOR; IDENTIFICATION; RECOGNITION; SPECIFICITY;
D O I
10.1093/nar/gkx756
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
引用
收藏
页码:11004 / 11018
页数:15
相关论文
共 50 条
  • [1] RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins
    Miyake, Hiroshi
    Kawaguchi, Risa Karakida
    Kiryu, Hisanori
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [2] RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites
    Ma, Hongli
    Wen, Han
    Xue, Zhiyuan
    Li, Guojun
    Zhang, Zhaolei
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (07)
  • [3] SSMART: sequence-structure motif identification for RNA-binding proteins
    Munteanu, Aline
    Mukherjee, Neelanjan
    Ohler, Uwe
    BIOINFORMATICS, 2018, 34 (23) : 3990 - 3998
  • [4] Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers
    Hoinka, Jan
    Zotenko, Elena
    Friedman, Adam
    Sauna, Zuben E.
    Przytycka, Teresa M.
    BIOINFORMATICS, 2012, 28 (12) : I215 - I223
  • [5] A deep boosting based approach for capturing the sequence binding preferences of RNA-binding proteins from high-throughput CLIP-seq data
    Li, Shuya
    Dong, Fanghong
    Wu, Yuexin
    Zhang, Sai
    Zhang, Chen
    Liu, Xiao
    Jiang, Tao
    Zeng, Jianyang
    NUCLEIC ACIDS RESEARCH, 2017, 45 (14)
  • [6] TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association
    Choi, Seo-Won
    Nam, Jin-Wu
    BMC BIOINFORMATICS, 2018, 19
  • [7] TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association
    Seo-Won Choi
    Jin-Wu Nam
    BMC Bioinformatics, 19
  • [8] Discovering sequence-structure motifs from protein segments and two applications
    Tang, T
    Xu, JB
    Li, M
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, 2005, : 370 - 381
  • [9] Combinatorial and High-throughput Approaches to Evaluate Sequence-Structure Relationships in the Four Helix Bundle Protein Rop
    Sen, Shiladitya
    Magliery, Thomas
    PROTEIN SCIENCE, 2012, 21 : 144 - 144
  • [10] Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS)
    Lou, Tzu-Fang
    Weidmann, Chase A.
    Killingsworth, Jordan
    Hall, Traci M. Tanaka
    Goldstrohm, Aaron C.
    Campbell, Zachary T.
    METHODS, 2017, 118 : 171 - 181