A random forest model for predicting exosomal proteins using evolutionary information and motifs

被引:4
|
作者
Arora, Akanksha [1 ]
Patiyal, Sumeet [1 ]
Sharma, Neelam [1 ]
Devi, Naorem Leimarembi [1 ]
Kaur, Dashleen [1 ]
Raghava, Gajendra P. S. [1 ,2 ]
机构
[1] Indraprastha Inst Informat Technol, Dept Computat Biol, New Delhi, India
[2] Indraprastha Inst Informat Technol, Dept Computat Biol, Okhla Ind Estate,Phase 3, New Delhi 110020, India
关键词
exosomal proteins; exosomes; extracellular vesicles; machine learning; motifs; PSSM profile; GENERATION; SIGNATURE;
D O I
10.1002/pmic.202300231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred () have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Predicting current and hydrogen productions from microbial electrolysis cells using random forest model
    Yoon, Jinyoung
    Cheong, Dae-Yeol
    Baek, Gahyun
    APPLIED ENERGY, 2024, 371
  • [42] Predicting Rock Hardness and Abrasivity Using Hyperspectral Imaging Data and Random Forest Regressor Model
    Ghadernejad, Saleh
    Esmaeili, Kamran
    REMOTE SENSING, 2024, 16 (20)
  • [43] Predicting temperature variability in major Indian cities using Random Forest Regression (RFR) Model
    Alone, Ashish
    Shukla, Anoop Kumar
    Nandan, Gopal
    Pattanaik, D. R.
    JOURNAL OF EARTH SYSTEM SCIENCE, 2025, 134 (01)
  • [44] Predicting the functional outcomes of anti-LGI1 encephalitis using a random forest model
    Li, Gongfei
    Liu, Xiao
    Wang, Minghui
    Yu, Tingting
    Ren, Jiechuan
    Wang, Qun
    ACTA NEUROLOGICA SCANDINAVICA, 2022, 146 (02): : 137 - 143
  • [45] Predicting Student Dropout in Self-Paced MOOC Course Using Random Forest Model
    Dass, Sheran
    Gary, Kevin
    Cunningham, James
    INFORMATION, 2021, 12 (11)
  • [46] Random Forest Machine Learning Model for Predicting Combustion Feedback Information of a Natural Gas Spark Ignition Engine
    Liu, Jinlong
    Ulishney, Christopher
    Dumitrescu, Cosmin Emil
    JOURNAL OF ENERGY RESOURCES TECHNOLOGY-TRANSACTIONS OF THE ASME, 2021, 143 (01):
  • [47] DPROT: prediction of disordered proteins using evolutionary information
    Deepti Sethi
    Aarti Garg
    G. P. S. Raghava
    Amino Acids, 2008, 35
  • [48] DPROT: prediction of disordered proteins using evolutionary information
    Sethi, Deepti
    Garg, Aarti
    Raghava, G. P. S.
    AMINO ACIDS, 2008, 35 (03) : 599 - 605
  • [49] Predicting Popularity of Online Articles using Random Forest Regression
    Shreyas, R.
    Akshata, D. M.
    Mahanand, B. S.
    Shagun, B.
    Abhishek, C. M.
    2016 SECOND INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING AND INFORMATION PROCESSING (CCIP), 2016,
  • [50] Predicting mortality risk for preterm infants using random forest
    Jennifer Lee
    Jinjin Cai
    Fuhai Li
    Zachary A. Vesoulis
    Scientific Reports, 11