A random forest model for predicting exosomal proteins using evolutionary information and motifs

被引:4
|
作者
Arora, Akanksha [1 ]
Patiyal, Sumeet [1 ]
Sharma, Neelam [1 ]
Devi, Naorem Leimarembi [1 ]
Kaur, Dashleen [1 ]
Raghava, Gajendra P. S. [1 ,2 ]
机构
[1] Indraprastha Inst Informat Technol, Dept Computat Biol, New Delhi, India
[2] Indraprastha Inst Informat Technol, Dept Computat Biol, Okhla Ind Estate,Phase 3, New Delhi 110020, India
关键词
exosomal proteins; exosomes; extracellular vesicles; machine learning; motifs; PSSM profile; GENERATION; SIGNATURE;
D O I
10.1002/pmic.202300231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred () have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Predicting Car Insurance Policies Using Random Forest
    Alshamsi, Asma S.
    2014 10TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2014, : 128 - 132
  • [32] Predicting IPO initial returns using random forest
    Baba, Boubekeur
    Sevil, Guven
    BORSA ISTANBUL REVIEW, 2020, 20 (01) : 13 - 23
  • [33] An Ensemble Classifier with Random Projection for Predicting Protein-Protein Interactions Using Sequence and Evolutionary Information
    Song, Xiao-Yu
    Chen, Zhan-Heng
    Sun, Xiang-Yang
    You, Zhu-Hong
    Li, Li-Ping
    Zhao, Yang
    APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [34] ExoPred: The first method for predicting vertebrata secreted proteins via exosome using random forest algorithm
    Ras Carmona, Alvaro
    Gomez Perosanz, Marta
    Antonio Reche, Pedro
    EUROPEAN JOURNAL OF IMMUNOLOGY, 2021, 51 : 417 - 417
  • [35] A random forest model for predicting crystal packing of olanzapine solvates
    Bhardwaj, Rajni M.
    Reutzel-Edens, Susan M.
    Johnston, Blair F.
    Florence, Alastair J.
    CRYSTENGCOMM, 2018, 20 (28) : 3947 - 3950
  • [36] Random forest model for predicting kinetic parameters of biomass devolatilization
    Xing J.-K.
    Wang H.-O.
    Luo K.
    Bai Y.
    Fan J.-R.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2019, 53 (03): : 605 - 612
  • [37] PREDICTING SOIL HEAVY METAL BASED ON RANDOM FOREST MODEL
    Ma, Weibo
    Tan, Kun
    Du, Peijun
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 4331 - 4334
  • [38] Evolutionary Random Forest Algorithm for Predicting the Maximum Failure Depth of Open Stope Hangingwalls
    Qi, Chongchong
    Chen, Qiusong
    IEEE ACCESS, 2018, 6 : 72808 - 72813
  • [39] Geological Disaster Susceptibility Evaluation Using a Random Forest Empowerment Information Quantity Model
    Li, Rongwei
    Tan, Shucheng
    Zhang, Mingfei
    Zhang, Shaohan
    Wang, Haishan
    Zhu, Lei
    SUSTAINABILITY, 2024, 16 (02)
  • [40] Predicting Food Sources of Listeria monocytogenes Based on Genomic Profiling Using Random Forest Model
    Gu, Weidong
    Cui, Zhaohui
    Stroika, Steven
    Carleton, Heather A.
    Conrad, Amanda
    Katz, Lee S.
    Richardson, LaTonia C.
    Hunter, Jennifer
    Click, Eleanor S.
    Bruce, Beau B.
    FOODBORNE PATHOGENS AND DISEASE, 2023, 20 (12) : 579 - 586