Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification

被引:25
|
作者
Bifarin, Olatomiwa O. [1 ,2 ]
机构
[1] Univ Georgia, Dept Biochem & Mol Biol, Athens, GA 30602 USA
[2] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30602 USA
来源
PLOS ONE | 2023年 / 18卷 / 05期
关键词
METABOLIGHTS; REPOSITORY;
D O I
10.1371/journal.pone.0284315
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Landslide Modeling in a Tropical Mountain Basin Using Machine Learning Algorithms and Shapley Additive Explanations
    Vega, Johnny
    Sepulveda-Murillo, Fabio Humberto
    Parra, Melissa
    AIR SOIL AND WATER RESEARCH, 2023, 16
  • [42] Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach
    Mangalathu, Sujith
    Hwang, Seong-Hoon
    Jeon, Jong-Su
    ENGINEERING STRUCTURES, 2020, 219
  • [43] Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations
    Aydin, Halit Enes
    Iban, Muzaffer Can
    NATURAL HAZARDS, 2023, 116 (03) : 2957 - 2991
  • [44] LEADING PREDICTORS OF ECONOMIC BURDEN AMONG POSTMENOPAUSAL WOMEN WITH HEART FAILURE: AN APPLICATION OF MACHINE LEARNING WITH XGBOOST AND SHAPLEY ADDITIVE EXPLANATIONS
    Dehghan, A.
    Park, C.
    Sambamoorthi, N.
    Shen, C.
    Shara, N.
    Sambamoorthi, U.
    VALUE IN HEALTH, 2023, 26 (06) : S289 - S290
  • [45] Predicting and analyzing flood susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations
    Halit Enes Aydin
    Muzaffer Can Iban
    Natural Hazards, 2023, 116 : 2957 - 2991
  • [46] Machine Learning Models Based on Grid-Search Optimization and Shapley Additive Explanations (SHAP) for Early Stroke Prediction
    Al Mamlook, Rabia Emhamed
    Lahwal, Fathia
    Elgeberi, Najat
    Obeidat, Muhammad
    Al-Na'amneh, Qais
    Nasayreh, Ahmad
    Gharaibeh, Hasan
    Gharaibeh, Tasnim
    Bzizi, Hanin
    4TH INTERDISCIPLINARY CONFERENCE ON ELECTRICS AND COMPUTER, INTCEC 2024, 2024,
  • [47] An interpretable approach combining Shapley additive explanations and LightGBM based on data augmentation for improving wheat yield estimates
    Wang, Ying
    Wang, Pengxin
    Tansey, Kevin
    Liu, Junming
    Delaney, Bethany
    Quan, Wenting
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 229
  • [48] Protein pKa Prediction by Tree-Based Machine Learning
    Chen, Ada Y.
    Lee, Juyong
    Damjanovic, Ana
    Brooks, Bernard R.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, 18 (04) : 2673 - 2686
  • [49] Runtime Optimizations for Tree-based Machine Learning Models
    Asadi, Nima
    Lin, Jimmy
    de Vries, Arjen P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (09) : 2281 - 2292
  • [50] Tree-based Machine Learning Methods for Survey Research
    Kern, Christoph
    Klausch, Thomas
    Kreuter, Frauke
    SURVEY RESEARCH METHODS, 2019, 13 (01): : 73 - 93