Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

被引:44
|
作者
Li, Fuyi [1 ]
Guo, Xudong [2 ]
Jin, Peipei [3 ]
Chen, Jinxiang [4 ]
Xiang, Dongxu [5 ]
Song, Jiangning [6 ,7 ]
Coin, Lachlan J. M. [8 ,9 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, 792 Elizabeth St, Melbourne, Vic 3000, Australia
[2] Ningxia Univ, Yinchuan, Ningxia, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Clin Lab, Ruijin Hosp, Sch Med, Shanghai, Peoples R China
[4] Northwest A&F Univ, Xianyang, Peoples R China
[5] Univ Melbourne, Fac Engn & Informat Technol, Melbourne, Vic, Australia
[6] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[7] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[8] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[9] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
基金
英国医学研究理事会; 美国国家卫生研究院; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
RNA pseudouridine sit; ebioinformatics; sequence analysis; machine learning; stacking ensemble learning; YEAST; MODEL;
D O I
10.1093/bib/bbab245
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Computational prediction of RNA editing sites
    Bundschuh, R
    BIOINFORMATICS, 2004, 20 (17) : 3214 - 3220
  • [32] New Approach in Genetic Algorithm for RNA Secondary Structure Prediction
    Binh Doan Duy
    Minh Tuan Pham
    Long Dang Duc
    Hoan Dau Manh
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2020, 11 (04) : 249 - 258
  • [33] RDDSVM: accurate prediction of A-to-I RNA editing sites from sequence using support vector machines
    Tac, Huseyin Avni
    Koroglu, Mustafa
    Sezerman, Ugur
    FUNCTIONAL & INTEGRATIVE GENOMICS, 2021, 21 (5-6) : 633 - 643
  • [34] RDDSVM: accurate prediction of A-to-I RNA editing sites from sequence using support vector machines
    Huseyin Avni Tac
    Mustafa Koroglu
    Ugur Sezerman
    Functional & Integrative Genomics, 2021, 21 : 633 - 643
  • [35] biRNA: Fast RNA-RNA Binding Sites Prediction
    Chitsaz, Hamidreza
    Backofen, Rolf
    Sahinalp, S. Cenk
    ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2009, 5724 : 25 - +
  • [36] PASSer: fast and accurate prediction of protein allosteric sites
    Tian, Hao
    Xiao, Sian
    Jiang, Xi
    Tao, Peng
    NUCLEIC ACIDS RESEARCH, 2023, 51 (W1) : W427 - W431
  • [37] Accurate Prediction of Peptide Binding Sites on Protein Surfaces
    Petsalaki, Evangelia
    Stark, Alexander
    Garcia-Urdiales, Eduardo
    Russell, Robert B.
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (03)
  • [38] Accurate Prediction of Translation Initiation Sites by Universum SVM
    Gao, Tingting
    Tian, Yingjie
    Shao, Xiaojian
    Deng, Naiyang
    OPTIMIZATION AND SYSTEMS BIOLOGY, PROCEEDINGS, 2008, 9 : 279 - +
  • [39] An integrated approach to accurate corrosion prediction
    Reddy, R.V.
    Nelson, J.L.
    Pacheco, J.L.
    JPT, Journal of Petroleum Technology, 2006, 58 (05): : 76 - 79
  • [40] Accurate SHAPE-directed RNA structure prediction
    Deigan, Katherine E.
    Li, Tian W.
    Mathews, David H.
    Weeks, Kevin M.
    FASEB JOURNAL, 2009, 23