A random forest model for predicting exosomal proteins using evolutionary information and motifs

被引:4
|
作者
Arora, Akanksha [1 ]
Patiyal, Sumeet [1 ]
Sharma, Neelam [1 ]
Devi, Naorem Leimarembi [1 ]
Kaur, Dashleen [1 ]
Raghava, Gajendra P. S. [1 ,2 ]
机构
[1] Indraprastha Inst Informat Technol, Dept Computat Biol, New Delhi, India
[2] Indraprastha Inst Informat Technol, Dept Computat Biol, Okhla Ind Estate,Phase 3, New Delhi 110020, India
关键词
exosomal proteins; exosomes; extracellular vesicles; machine learning; motifs; PSSM profile; GENERATION; SIGNATURE;
D O I
10.1002/pmic.202300231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred () have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information
    Alballa, Munira
    Aplop, Faizah
    Butler, Gregory
    PLOS ONE, 2020, 15 (01):
  • [22] Predicting Self-Interacting Proteins Using a Recurrent Neural Network and Protein Evolutionary Information
    An, Ji-Yong
    Zhou, Yong
    Yan, Zi-Ji
    Zhao, Yu-Jun
    EVOLUTIONARY BIOINFORMATICS, 2020, 16
  • [23] Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
    Cheng, Cheng-Wei
    Su, Emily Chia-Yu
    Hwang, Jenn-Kang
    Sung, Ting-Yi
    Hsu, Wen-Lian
    BMC BIOINFORMATICS, 2008, 9
  • [24] Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
    Cheng-Wei Cheng
    Emily Chia-Yu Su
    Jenn-Kang Hwang
    Ting-Yi Sung
    Wen-Lian Hsu
    BMC Bioinformatics, 9
  • [25] Predicting the mechanical properties of pristine and defective carbon nanotubes using a random forest model
    Ibn Malek, Ihtesham
    Sarkar, Koushik
    Zubair, Ahmed
    NANOSCALE ADVANCES, 2024, 6 (20): : 5112 - 5132
  • [26] Predicting types of occupational accidents at construction sites in Korea using random forest model
    Kang, Kyungsu
    Ryu, Hanguk
    SAFETY SCIENCE, 2019, 120 : 226 - 236
  • [27] PREDICTING THE OUTCOMES OF IN VITRO FERTILIZATION PROGRAMS USING A RANDOM FOREST MACHINE LEARNING MODEL
    Vladimirsky, G. M.
    Zhuravleva, M. A.
    Dashieva, A. E.
    Korneeva, I. E.
    Nazarenko, T. A.
    BULLETIN OF RUSSIAN STATE MEDICAL UNIVERSITY, 2023, (06): : 64 - 70
  • [28] Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model
    Brokamp, Cole
    Jandarov, Roman
    Hossain, Monir
    Ryan, Patrick
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2018, 52 (07) : 4173 - 4179
  • [29] A New Evolutionary Hybrid Random Forest Model for SPEI Forecasting
    Mehr, Ali Danandeh
    Haghighi, Ali Torabi
    Jabarnejad, Masood
    Safari, Mir Jafar Sadegh
    Nourani, Vahid
    WATER, 2022, 14 (05)
  • [30] Predicting Bank Financial Failures using Random Forest
    Rustam, Zuherman
    Saragih, Glori Stephani
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 81 - 86