RNA secondary structural alignment with conditional random fields

被引:37
|
作者
Sato, K [1 ]
Sakakibara, Y [1 ]
机构
[1] Keio Univ, Dept Biosci & Informat, Kohoku Ku, Yokohama, Kanagawa 2238522, Japan
关键词
D O I
10.1093/bioinformatics/bti1139
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The computational identification of non-coding RNA regions on the genome is currently receiving much attention. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments. Results: We propose a novel approach for RNA structural alignment based on conditional random fields (CRFs). Our approach has some specific features compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can most probably discriminate between correct alignments and incorrect alignments, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Furthermore, structural alignment search based on CRFs is more accurate for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.
引用
收藏
页码:237 / 242
页数:6
相关论文
共 50 条
  • [31] A conditional random fields method for RNA sequence-structure relationship modeling and conformation sampling
    Wang, Zhiyong
    Xu, Jinbo
    [J]. BIOINFORMATICS, 2011, 27 (13) : I102 - I110
  • [32] Structural alignment of pseudoknotted RNA
    Dost, Banu
    Han, Buhm
    Zhang, Shaojie
    Bafna, Vineet
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2006, 3909 : 143 - 158
  • [33] Structural alignment of pseudoknotted RNA
    Han, Buhm
    Dost, Banu
    Bafna, Vineet
    Zhang, Shaojie
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (05) : 489 - 504
  • [34] RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment
    Xu, Xing
    Ji, Yongmei
    Stormo, Gary D.
    [J]. BIOINFORMATICS, 2007, 23 (15) : 1883 - 1891
  • [35] Masked Conditional Random Fields for Sequence Labeling
    Wei, Tianwen
    Qi, Jianwei
    He, Shenghuan
    Sun, Songtao
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2024 - 2035
  • [36] CONTEXTUAL UNMIXING OF GEOSPATIAL DATA BASED ON MARKOV RANDOM FIELDS AND CONDITIONAL RANDOM FIELDS
    Nishii, Ryuei
    Ozaki, Tomohiko
    [J]. 2009 FIRST WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING, 2009, : 478 - +
  • [37] Hidden Conditional Random Fields for Phone Recognition
    Sung, Yun-Hsuan
    Jurafsky, Dan
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 107 - 112
  • [38] Learning flexible features for conditional random fields
    Stewart, Liam
    He, Xuming
    Zemel, Richard S.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1415 - 1426
  • [39] Conditional components for simulation of vector random fields
    Vargas-Guzmán, JA
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2003, 17 (04) : 260 - 271
  • [40] TildeCRF: Conditional random fields for logical sequences
    Gutmann, Bernd
    Kersting, Kristian
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 174 - 185