Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

被引:44
|
作者
Li, Fuyi [1 ]
Guo, Xudong [2 ]
Jin, Peipei [3 ]
Chen, Jinxiang [4 ]
Xiang, Dongxu [5 ]
Song, Jiangning [6 ,7 ]
Coin, Lachlan J. M. [8 ,9 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, 792 Elizabeth St, Melbourne, Vic 3000, Australia
[2] Ningxia Univ, Yinchuan, Ningxia, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Clin Lab, Ruijin Hosp, Sch Med, Shanghai, Peoples R China
[4] Northwest A&F Univ, Xianyang, Peoples R China
[5] Univ Melbourne, Fac Engn & Informat Technol, Melbourne, Vic, Australia
[6] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[7] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[8] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[9] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
基金
英国医学研究理事会; 美国国家卫生研究院; 澳大利亚研究理事会; 澳大利亚国家健康与医学研究理事会;
关键词
RNA pseudouridine sit; ebioinformatics; sequence analysis; machine learning; stacking ensemble learning; YEAST; MODEL;
D O I
10.1093/bib/bbab245
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A novel data mining approach for the accurate prediction of translation initiation sites
    Tzanis, George
    Berberidis, Christos
    Vlahavas, Ioannis
    BIOLOGICAL AND MEDICAL DATA ANALYSIS, PROCEEDINGS, 2006, 4345 : 92 - +
  • [22] MU-PseUDeep: A deep learning method for prediction of pseudouridine sites
    Khan, Saad M.
    He, Fei
    Wang, Duolin
    Chen, Yongbing
    Xu, Dong
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 1877 - 1883
  • [23] A new approach for accurate prediction of toxicity of amino compounds
    Pouretedal, Hamid Reza
    Keshavarz, Mohammad Hossein
    Abbasi, Ali
    JOURNAL OF THE IRANIAN CHEMICAL SOCIETY, 2015, 12 (03) : 487 - 502
  • [24] A New Approach for Accurate Path Prediction Using Multiple Prediction System
    Jose, Cyriac
    Grace, K. S. Vijula
    MATERIALS TODAY-PROCEEDINGS, 2020, 24 : 1749 - 1757
  • [25] A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach
    Wang, Xiao
    Lin, Xi
    Wang, Rong
    Han, Nijia
    Fan, Kaiqi
    Han, Lijun
    Ding, Zhaoyuan
    CURRENT ISSUES IN MOLECULAR BIOLOGY, 2021, 43 (03) : 1844 - 1858
  • [26] A new approach for accurate prediction of toxicity of amino compounds
    Hamid Reza Pouretedal
    Mohammad Hossein Keshavarz
    Ali Abbasi
    Journal of the Iranian Chemical Society, 2015, 12 : 487 - 502
  • [27] Penguin: A tool for predicting pseudouridine sites in direct RNA nanopore sequencing data
    Hassan, Doaa
    Acevedo, Daniel
    Daulatabad, Swapna Vidhur
    Mir, Quoseena
    Janga, Sarath Chandra
    METHODS, 2022, 203 : 478 - 487
  • [28] iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model
    Mu, Yashuang
    Zhang, Ruijun
    Wang, Lidong
    Liu, Xiaodong
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2020, 12 (02) : 193 - 203
  • [29] iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks
    Tahir, Muhammad
    Tayara, Hilal
    Chong, Kul To
    MOLECULAR THERAPY NUCLEIC ACIDS, 2019, 16 : 463 - 470
  • [30] iPseU-Layer: Identifying RNA Pseudouridine Sites Using Layered Ensemble Model
    Yashuang Mu
    Ruijun Zhang
    Lidong Wang
    Xiaodong Liu
    Interdisciplinary Sciences: Computational Life Sciences, 2020, 12 : 193 - 203