A holistic approach towards a generalizable machine learning predictor of cell penetrating peptides

被引:1
|
作者
Ismail, Bahaa [1 ]
Jones, Sarah [1 ]
Howl, John [1 ]
机构
[1] Wolverhampton Univ, Res Inst Healthcare Sci, Wulfruna St, Wolverhampton WV1 1LY, England
关键词
amino acid composition; cellular uptake; CPP; data pre-processing; drug delivery; feature optimization; machine learning; peptide classification; SVM;
D O I
10.1071/CH22247
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The development of machine learning (ML) predictors does not necessarily require the employment of expansive classifiers and complex feature encoding schemes to achieve the highest accuracy scores. It rather requires data pre-processing, feature optimization, and robust evaluation to ensure consistent results and generalizability. Herein, we describe a multi-stage process to develop a reliable ML predictor of cell penetrating peptides (CPPs). We emphasize the challenges of: (i) the generation of representative datasets with all required pre-processing procedures; (ii) comprehensive and exclusive encoding of peptides using their amino acid composition; (iii) obtaining an optimized feature set using a simple classifier (support vector machine, SVM); (iv) ensuring consistent results; and (v) verifying generalizability at the highest achievable accuracy scores. Two peptide sub-spaces were used to generate the negative examples, which are required, along with the known CPPs, to train the classifier. These included: (i) randomly generated peptides with all amino acid types being equally represented and (ii) extracted peptides from receptor proteins. Results indicated that the randomly generated dataset performed perfectly well within its own peptide sub-space, while it poorly generalized to the other subspace. Conversely, the dataset extracted from receptor proteins, while achieving lower accuracies, showed a perfect generalizability to the other peptide sub-space. We combined the qualities of these two datasets by utilizing the average of their predictions within our ultimate framework. This functional ML predictor, WLVCPP, and associated software and datasets can be downloaded from https://github.com/BahaaIsmail/WLVCPP.
引用
收藏
页码:493 / 506
页数:14
相关论文
共 50 条
  • [1] A holistic approach towards a generalizable machine learning predictor of cell penetrating peptides
    Ismail, Bahaa
    Jones, Sarah
    Howl, John
    [J]. AUSTRALIAN JOURNAL OF CHEMISTRY, 2023, 76 (08) : 493 - 506
  • [2] Machine learning antimicrobial and cell penetrating peptides
    Wong, Gerard
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [3] Machine Learning To Predict Cell-Penetrating Peptides for Antisense Delivery
    Wolfe, Justin M.
    Fadzen, Colin M.
    Choo, Zi-Ning
    Holden, Rebecca L.
    Yao, Monica
    Hanson, Gunnar J.
    Pentelute, Bradley L.
    [J]. ACS CENTRAL SCIENCE, 2018, 4 (04) : 512 - 520
  • [4] The Development of Machine Learning Methods in Cell-penetrating Peptides Identification: A Brief Review
    Wei, Huan-Huan
    Yang, Wuritu
    Tang, Hua
    Lin, Hao
    [J]. CURRENT DRUG METABOLISM, 2019, 20 (03) : 217 - 223
  • [5] Canopy classification using LiDAR: a generalizable machine learning approach
    Jones, R. Sky
    Elkadiri, Racha
    Momm, Henrique
    [J]. MODELING EARTH SYSTEMS AND ENVIRONMENT, 2023, 9 (02) : 2371 - 2384
  • [6] Canopy classification using LiDAR: a generalizable machine learning approach
    R. Sky Jones
    Racha Elkadiri
    Henrique Momm
    [J]. Modeling Earth Systems and Environment, 2023, 9 : 2371 - 2384
  • [7] A generalizable and accessible approach to machine learning with global satellite imagery
    Esther Rolf
    Jonathan Proctor
    Tamma Carleton
    Ian Bolliger
    Vaishaal Shankar
    Miyabi Ishihara
    Benjamin Recht
    Solomon Hsiang
    [J]. Nature Communications, 12
  • [8] A generalizable and accessible approach to machine learning with global satellite imagery
    Rolf, Esther
    Proctor, Jonathan
    Carleton, Tamma
    Bolliger, Ian
    Shankar, Vaishaal
    Ishihara, Miyabi
    Recht, Benjamin
    Hsiang, Solomon
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)
  • [9] Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space
    Ewerton Cristhian Lima de Oliveira
    Kauê Santana
    Luiz Josino
    Anderson Henrique Lima e Lima
    Claudomiro de Souza de Sales Júnior
    [J]. Scientific Reports, 11
  • [10] Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space
    Lima de Oliveira, Ewerton Cristhian
    Santana, Kaue
    Josino, Luiz
    Lima e Lima, Anderson Henrique
    de Sales Junior, Claudomiro de Souza
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)