Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification

被引:25
|
作者
Bifarin, Olatomiwa O. [1 ,2 ]
机构
[1] Univ Georgia, Dept Biochem & Mol Biol, Athens, GA 30602 USA
[2] Georgia Inst Technol, Sch Chem & Biochem, Atlanta, GA 30602 USA
来源
PLOS ONE | 2023年 / 18卷 / 05期
关键词
METABOLIGHTS; REPOSITORY;
D O I
10.1371/journal.pone.0284315
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning (ML) models are used in clinical metabolomics studies most notably for biomarker discoveries, to identify metabolites that discriminate between a case and control group. To improve understanding of the underlying biomedical problem and to bolster confidence in these discoveries, model interpretability is germane. In metabolomics, partial least square discriminant analysis (PLS-DA) and its variants are widely used, partly due to the model's interpretability with the Variable Influence in Projection (VIP) scores, a global interpretable method. Herein, Tree-based Shapley Additive explanations (SHAP), an interpretable ML method grounded in game theory, was used to explain ML models with local explanation properties. In this study, ML experiments (binary classification) were conducted for three published metabolomics datasets using PLS-DA, random forests, gradient boosting, and extreme gradient boosting (XGBoost). Using one of the datasets, PLS-DA model was explained using VIP scores, while one of the best-performing models, a random forest model, was interpreted using Tree SHAP. The results show that SHAP has a more explanation depth than PLS-DA's VIP, making it a powerful method for rationalizing machine learning predictions from metabolomics studies.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Predicting the hardgrove grindability index using interpretable decision tree-based machine learning models
    Chen, Yuxin
    Khandelwal, Manoj
    Onifade, Moshood
    Zhou, Jian
    Lawal, Abiodun Ismail
    Bada, Samson Oluwaseyi
    Genc, Bekir
    FUEL, 2025, 384
  • [22] Hybrid machine learning model and Shapley additive explanations for compressive strength of sustainable concrete
    Wu, Yanqi
    Zhou, Yisong
    CONSTRUCTION AND BUILDING MATERIALS, 2022, 330
  • [23] Learning Interpretable, Tree-Based Projection Mappings for Nonlinear Embeddings
    Zharmagambetov, Arman
    Carreira-Perpinan, Miguel A.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [24] Application of decision tree-based ensemble learning in the classification of breast cancer
    Ghiasi, Mohammad M.
    Zendehboudi, Sohrab
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 128
  • [25] Influence of metakaolin on pervious concrete strength: a machine learning approach with shapley additive explanations
    Sathiparan, Navaratnarajah
    Jeyananthan, Pratheeba
    Subramaniam, Daniel Niruban
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (04) : 3919 - 3946
  • [26] Tree-Based Machine Learning Techniques for Automated Human Sleep Stage Classification
    Arslan, Recep Sinan
    Ulutas, Hasan
    Koksal, Ahmet Sertol
    Bakir, Mehmet
    Ciftci, Bulent
    TRAITEMENT DU SIGNAL, 2023, 40 (04) : 1385 - 1400
  • [27] Decision Tree-based Machine Learning Algorithm for In-node Vehicle Classification
    Ying, Kyle
    Ameri, Alireza
    Trivedi, Ankit
    Ravindra, Dilip
    Patel, Darshan
    Mozumdar, Mohammad
    2015 IEEE GREEN ENERGY AND SYSTEMS CONFERENCE (IGESC), 2015, : 71 - 76
  • [28] Malware Classification of Portable Executables using Tree-Based Ensemble Machine Learning
    Atluri, Venkata
    2019 IEEE SOUTHEASTCON, 2019,
  • [29] Optimizing Binary Decision Diagrams for Interpretable Machine Learning Classification
    Cabodi, Gianpiero
    Camurati, Paolo E.
    Ignatiev, Alexey
    Marques-Silva, Joao
    Palena, Marco
    Pasini, Paolo
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1122 - 1125
  • [30] A model for predicting academic performance on standardised tests for lagging regions based on machine learning and Shapley additive explanations
    Suaza-Medina, Mario
    Penabaena-Niebles, Rita
    Jubiz-Diaz, Maria
    SCIENTIFIC REPORTS, 2024, 14 (01):