A random forest model for predicting exosomal proteins using evolutionary information and motifs

被引:3
|
作者
Arora, Akanksha [1 ]
Patiyal, Sumeet [1 ]
Sharma, Neelam [1 ]
Devi, Naorem Leimarembi [1 ]
Kaur, Dashleen [1 ]
Raghava, Gajendra P. S. [1 ,2 ]
机构
[1] Indraprastha Inst Informat Technol, Dept Computat Biol, New Delhi, India
[2] Indraprastha Inst Informat Technol, Dept Computat Biol, Okhla Ind Estate,Phase 3, New Delhi 110020, India
关键词
exosomal proteins; exosomes; extracellular vesicles; machine learning; motifs; PSSM profile; GENERATION; SIGNATURE;
D O I
10.1002/pmic.202300231
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred () have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins
    Chen, Ava S-Y.
    Westwood, Nicholas J.
    Brear, Paul
    Rogers, Graeme W.
    Mavridis, Lazaros
    Mitchell, John B. O.
    [J]. MOLECULAR INFORMATICS, 2016, 35 (3-4) : 125 - 135
  • [2] Predicting Structural Motifs of Glycosaminoglycans using Cryogenic Infrared Spectroscopy and Random Forest
    Riedel, Jerome
    Meijer, Gerard
    von Helden, Gert
    Lettow, Maike
    Gotze, Michael
    Miller, Rebecca L.
    Boons, Geert-Jan
    Szekeres, Gergo Peter
    Pagel, Kevin
    Grabarics, Marko
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2023, 145 (14) : 7859 - 7868
  • [3] Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs
    Rashid, Mamoon
    Saha, Sudipto
    Raghava, Gajendra P. S.
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [4] Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs
    Mamoon Rashid
    Sudipto Saha
    Gajendra PS Raghava
    [J]. BMC Bioinformatics, 8
  • [5] Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure
    Shi, Han
    Liu, Simin
    Chen, Junqi
    Li, Xuan
    Ma, Qin
    Yu, Bin
    [J]. GENOMICS, 2019, 111 (06) : 1839 - 1852
  • [6] Recognition of beta-alpha-beta Motifs in Proteins by Using Random Forest Algorithm
    Sun, Lixia
    Hu, Xiuzhen
    [J]. PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 546 - 551
  • [7] Predicting host tropism of influenza A virus proteins using random forest
    Christine LP Eng
    Joo Chuan Tong
    Tin Wee Tan
    [J]. BMC Medical Genomics, 7
  • [8] Predicting host tropism of influenza A virus proteins using random forest
    Eng, Christine L. P.
    Tong, Joo Chuan
    Tan, Tin Wee
    [J]. BMC MEDICAL GENOMICS, 2014, 7
  • [9] Predicting Osteoarthritis of the Temporomandibular Joint Using Random Forest with Privileged Information
    Warner, Elisa
    Al-Turkestani, Najla
    Bianchi, Jonas
    Gurgel, Marcela Lima
    Cevidanes, Lucia
    Rao, Arvind
    [J]. ETHICAL AND PHILOSOPHICAL ISSUES IN MEDICAL IMAGING, MULTIMODAL LEARNING AND FUSION ACROSS SCALES FOR CLINICAL DECISION SUPPORT, AND TOPOLOGICAL DATA ANALYSIS FOR BIOMEDICAL IMAGING, EPIMI 2022, ML-CDS 2022, TDA4BIOMEDICALIMAGING, 2022, 13755 : 77 - 86
  • [10] Predicting Alternative Conformations of Membrane Proteins using Evolutionary Sequence Information
    Shin, Jung-Eun
    Marks, Debora
    [J]. PROTEIN SCIENCE, 2018, 27 : 218 - 218