Explaining machine learning models using entropic variable projection

被引:1
|
作者
Bachoc, Francois [1 ]
Gamboa, Fabrice [1 ,2 ]
Halford, Max [3 ]
Loubes, Jean-Michel [1 ,2 ]
Risser, Laurent [1 ,2 ]
机构
[1] Inst Math Toulouse, Toulouse, France
[2] Artificial & Nat Intelligence Toulouse Inst 3IA AN, Toulouse, France
[3] Inst Rech Informat Toulouse, Toulouse, France
关键词
Explainability; Black-box decision rules; Kullback-Leibler divergence; Wasserstein distance;
D O I
10.1093/imaiai/iaad010
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we present a new explainability formalism designed to shed light on how each input variable of a test set impacts the predictions of machine learning models. Hence, we propose a group explainability formalism for trained machine learning decision rules, based on their response to the variability of the input variables distribution. In order to emphasize the impact of each input variable, this formalism uses an information theory framework that quantifies the influence of all input-output observations based on entropic projections. This is thus the first unified and model agnostic formalism enabling data scientists to interpret the dependence between the input variables, their impact on the prediction errors and their influence on the output predictions. Convergence rates of the entropic projections are provided in the large sample case. Most importantly, we prove that computing an explanation in our framework has a low algorithmic complexity, making it scalable to real-life large datasets. We illustrate our strategy by explaining complex decision rules learned using XGBoost, Random Forest or Deep Neural Network classifiers on various datasets such as Adult Income, MNIST, CelebA, Boston Housing, Iris, as well as synthetic ones. We finally make clear its differences with the explainability strategies LIME and SHAP, which are based on single observations. Results can be reproduced using the freely distributed Python toolbox .
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Joint variable and variable projection algorithms for separable nonlinear models using Aitken acceleration technique
    Cheng, Lianyuan
    Chen, Jing
    Rong, Yingjiao
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2023, 42 (03) : 202 - 210
  • [32] Explaining poor performance of text-based machine learning models for vulnerability detection
    Napier, Kollin
    Bhowmik, Tanmay
    Chen, Zhiqian
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (05)
  • [33] Explaining Accurate Predictions of Multitarget Compounds with Machine Learning Models Derived for Individual Targets
    Lamens, Alec
    Bajorath, Jurgen
    MOLECULES, 2023, 28 (02):
  • [34] Explaining complex systems: a tutorial on transparency and interpretability in machine learning models (part II)
    Materassi, Donatello
    Warnick, Sean
    Rojas, Cristian
    Schoukens, Maarten
    Cross, Elizabeth
    IFAC PAPERSONLINE, 2024, 58 (15): : 497 - 501
  • [35] Expanding the concept of preventive medicine through the using explaining machine learning models in remote clinic-patient interaction
    Chizhik, Anna V.
    Egorov, Michil P.
    Vidiasova, Lyudmila A.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE, ICEGOV 2023, 2023, : 454 - 456
  • [36] Variable selection in qualitative models via an entropic explanatory power
    Dupuis, JA
    Robert, CP
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 111 (1-2) : 77 - 94
  • [37] Latent Variable Machine Learning Framework for Catalysis: General Models, Transfer Learning, and Interpretability
    Kayode, Gbolade O.
    Montemore, Matthew M.
    JACS AU, 2023, 4 (01): : 80 - 91
  • [38] Defeaturing of CAD Models Using Machine Learning
    Shinde, Sudhir L.
    Kukreja, Aman
    Pande, S. S.
    JOURNAL OF ADVANCED MANUFACTURING SYSTEMS, 2024, 23 (03) : 531 - 547
  • [39] Using Stacking Approaches for Machine Learning Models
    Pavlyshenko, Bohdan
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 255 - 258
  • [40] Screening for Prediabetes Using Machine Learning Models
    Choi, Soo Beom
    Kim, Won Jae
    Yoo, Tae Keun
    Park, Jee Soo
    Chung, Jai Won
    Lee, Yong-ho
    Kang, Eun Seok
    Kim, Deok Won
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014