RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization

被引:21
|
作者
Yuan, Guo-Hua [1 ]
Wang, Ying [1 ]
Wang, Guang-Zhong [1 ]
Yang, Li
机构
[1] Chinese Acad Sci, Shanghai Inst Nutr & Hlth, Beijing, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA localization; machine learning; nucleotide feature; motif; RNA binding protein; circular RNA; LONG NONCODING RNAS; MESSENGER-RNA; TRANSCRIPTION; MECHANISMS; REPEATS;
D O I
10.1093/bib/bbac509
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Can machine learning model with static features be fooled: an adversarial machine learning approach
    Rahim Taheri
    Reza Javidan
    Mohammad Shojafar
    P. Vinod
    Mauro Conti
    Cluster Computing, 2020, 23 : 3233 - 3253
  • [32] Can machine learning model with static features be fooled: an adversarial machine learning approach
    Taheri, Rahim
    Javidan, Reza
    Shojafar, Mohammad
    Vinod, P.
    Conti, Mauro
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (04): : 3233 - 3253
  • [33] RNALoc-LM: RNA subcellular localization prediction using pre-trained RNA language model
    Zeng, Min
    Zhang, Xinyu
    Li, Yiming
    Lu, Chengqian
    Yin, Rui
    Guo, Fei
    Li, Min
    BIOINFORMATICS, 2025, 41 (04)
  • [34] Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features
    Shanping Qiao
    Baoqiang Yan
    Jing Li
    Applied Intelligence, 2018, 48 : 1813 - 1824
  • [35] Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features
    Qiao, Shanping
    Yan, Baoqiang
    Li, Jing
    APPLIED INTELLIGENCE, 2018, 48 (07) : 1813 - 1824
  • [36] Combining Machine Learning and Homology-Based Approaches to Accurately Predict Subcellular Localization in Arabidopsis
    Kaundal, Rakesh
    Saini, Reena
    Zhao, Patrick X.
    PLANT PHYSIOLOGY, 2010, 154 (01) : 36 - 54
  • [37] Prediction of protein subcellular localization using machine learning with novel use of generic feature set
    Upama, Paramita Basak
    Tanny, Nawshin Tabassum
    Akhter, Shahin
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 98 - 101
  • [38] Protein subcellular localization prediction using multiple kernel learning based support vector machine
    Hasan, Md. Al Mehedi
    Ahmad, Shamim
    Molla, Md. Khademul Islam
    MOLECULAR BIOSYSTEMS, 2017, 13 (04) : 785 - 795
  • [39] Evaluation of the dependence of radiomic features on the machine learning model
    Demircioglu, Aydin
    INSIGHTS INTO IMAGING, 2022, 13 (01)
  • [40] Evaluation of the dependence of radiomic features on the machine learning model
    Aydin Demircioğlu
    Insights into Imaging, 13