Machine learning methods for predictive proteomics

被引:47
|
作者
Barla, Annalisa [1 ]
Jurman, Giuseppe [1 ]
Riccadonna, Samantha [1 ]
Merler, Stefano [1 ]
Chierici, Marco [1 ]
Furlanello, Cesare [1 ]
机构
[1] FBK, MPBA Unit, I-38100 Trento, Italy
关键词
proteomics; selection bias; feature selection; functional profiling;
D O I
10.1093/bib/bbn008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.
引用
收藏
页码:119 / 128
页数:10
相关论文
共 50 条
  • [41] Interpretation of the DOME Recommendations for Machine Learning in Proteomics and Metabolomics
    Palmblad, Magnus
    Boecker, Sebastian
    Degroeve, Sven
    Kohlbacher, Oliver
    Kall, Lukas
    Noble, William Stafford
    Wilhelm, Mathias
    [J]. JOURNAL OF PROTEOME RESEARCH, 2022, 21 (04) : 1204 - 1207
  • [42] Machine Learning Analysis of Proteomics Data for Early Diagnosis
    Devetyarov, Dmitry
    [J]. MEDICAL INFORMATICS IN A UNITED AND HEALTHY EUROPE, 2009, 150 : 772 - 772
  • [43] Machine Learning Classification of Diagnostic Proteomics for Alzheimer Disease
    Tandon, Raghav
    Seyfried, Nicholas
    Mitchell, Cassie S.
    [J]. ANNALS OF NEUROLOGY, 2021, 90 : S91 - S91
  • [44] Machine Learning for Mass Spectrometry Data Analysis in Proteomics
    Li, Juntao
    Zhou, Kanglei
    Mu, Bingyu
    [J]. CURRENT PROTEOMICS, 2021, 18 (05) : 620 - 634
  • [45] Predictive Performance of Machine Learning-Based Methods for the Prediction of Preeclampsia-A Prospective Study
    Melinte-Popescu, Alina-Sinziana
    Vasilache, Ingrid-Andrada
    Socolov, Demetra
    Melinte-Popescu, Marian
    [J]. JOURNAL OF CLINICAL MEDICINE, 2023, 12 (02)
  • [46] Machine learning methods for developing a predictive model of the incidence of delirium in cardiac intensive care units
    Ko, Ryoung-Eun
    Lee, Jihye
    Kim, Sungeun
    Ahn, Joong Hyun
    Na, Soo Jin
    Yang, Jeong Hoon
    [J]. REVISTA ESPANOLA DE CARDIOLOGIA, 2024, 77 (07): : 547 - 555
  • [47] Kernel methods in machine learning
    Hofmann, Thomas
    Schoelkopf, Bernhard
    Smola, Alexander J.
    [J]. ANNALS OF STATISTICS, 2008, 36 (03): : 1171 - 1220
  • [48] Predictive analysis of gas hold-up in bubble column using machine learning methods
    Hazare, Sumit R.
    Patil, Chinmay S.
    V. Vala, Shivam
    Joshi, Aniruddha J.
    Joshi, Jyeshtharaj B.
    Vitankar, Vivek S.
    Patwardhan, Ashwin W.
    [J]. CHEMICAL ENGINEERING RESEARCH & DESIGN, 2022, 184 : 724 - 739
  • [49] Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
    Qidong Cai
    Boxue He
    Pengfei Zhang
    Zhenyu Zhao
    Xiong Peng
    Yuqian Zhang
    Hui Xie
    Xiang Wang
    [J]. Journal of Translational Medicine, 18
  • [50] Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits
    Gonzalez-Recio, Oscar
    Rosa, Guilherme J. M.
    Gianola, Daniel
    [J]. LIVESTOCK SCIENCE, 2014, 166 : 217 - 231