RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization

被引:21
|
作者
Yuan, Guo-Hua [1 ]
Wang, Ying [1 ]
Wang, Guang-Zhong [1 ]
Yang, Li
机构
[1] Chinese Acad Sci, Shanghai Inst Nutr & Hlth, Beijing, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA localization; machine learning; nucleotide feature; motif; RNA binding protein; circular RNA; LONG NONCODING RNAS; MESSENGER-RNA; TRANSCRIPTION; MECHANISMS; REPEATS;
D O I
10.1093/bib/bbac509
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine
    Zhang, S.
    Zhang, T.
    Liu, C.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2019, 30 (03) : 209 - 228
  • [2] Prediction of Protein Subcellular Localization using Machine Learning
    Upama, Paramita Basak
    Akhter, Shahin
    Bin Asad, Mohammad Imam Hasan
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [3] Clinically adaptable machine learning model to identify early appreciable features of diabetes
    Nipa, Nurjahan
    Riyad, Mahmudul Hasan
    Satu, Shahriare
    Walliullah
    Howlader, Koushik Chandra
    Moni, Mohammad Ali
    INTELLIGENT MEDICINE, 2024, 4 (01): : 22 - 32
  • [4] Evaluation of machine learning models that predict lncRNA subcellular localization
    Miller, Jason R.
    Yi, Weijun
    Adjeroh, Donald A.
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (03)
  • [5] DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning
    Wang, Jun
    Horlacher, Marc
    Cheng, Lixin
    Winther, Ole
    BIOINFORMATICS, 2024, 40 (02)
  • [6] Prediction of RNA subcellular localization: Learning from heterogeneous data sources
    Savulescu, Anca Flavia
    Bouilhol, Emmanuel
    Beaume, Nicolas
    Nikolski, Macha
    ISCIENCE, 2021, 24 (11)
  • [7] Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features
    Gudenas, Brian L.
    Wang, Liangjiang
    SCIENTIFIC REPORTS, 2018, 8
  • [8] Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features
    Brian L. Gudenas
    Liangjiang Wang
    Scientific Reports, 8
  • [9] MSLP: mRNA subcellular localization predictor based on machine learning techniques
    Musleh, Saleh
    Islam, Mohammad Tariqul
    Qureshi, Rizwan
    Alajez, Nihad
    Alam, Tanvir
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [10] Extreme Learning Machine Based Bacterial Protein Subcellular Localization Prediction
    Lan, Yuan
    Soh, Yeng Chai
    Huang, Guang-Bin
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1859 - 1863