Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models

被引:0
|
作者
Welchowski, Thomas [1 ]
Maloney, Kelly O. [2 ]
Mitchell, Richard [3 ]
Schmid, Matthias [1 ]
机构
[1] Univ Bonn, Med Fac, Dept Med Biometry Informat & Epidemiol, Venusberg Campus 1, D-53127 Bonn, Germany
[2] US Geol Survey USGS, Eastern Ecol Sci Ctr, Leetown Res Lab, 11649 Leetown Rd, Kearneysville, WV 25430 USA
[3] US EPA, Off Water Washington, Washington, DC 20460 USA
关键词
Boosting; Interpretable machine learning; Interaction terms; Macroinvertebrates; Stream health; IMPERVIOUS COVER; LAND-USE; STREAM; TREES; CLASSIFICATION;
D O I
10.1007/s13253-021-00479-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate-response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman's H-2 statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests.
引用
收藏
页码:175 / 197
页数:23
相关论文
共 50 条
  • [1] Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models: Case Study on Biological Health of Streams in the United States with Gradient Boosted Trees
    Welchowski T.
    Maloney K.O.
    Mitchell R.
    Schmid M.
    [J]. Journal of Agricultural, Biological and Environmental Statistics, 2022, 27 (1) : 175 - 197
  • [2] Regularizing Black-box Models for Improved Interpretability
    Plumb, Gregory
    Al-Shedivat, Maruan
    Cabrera, Angel Alexander
    Perer, Adam
    Xing, Eric
    Talwalkar, Ameet
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Explainable Debugger for Black-box Machine Learning Models
    Rasouli, Peyman
    Yu, Ingrid Chieh
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Explainable artificial intelligence enhances the ecological interpretability of black-box species distribution models
    Ryo, Masahiro
    Angelov, Boyan
    Mammola, Stefano
    Kass, Jamie M.
    Benito, Blas M.
    Hartig, Florian
    [J]. ECOGRAPHY, 2021, 44 (02) : 199 - 205
  • [5] Identifying the Machine Learning Family from Black-Box Models
    Fabra-Boluda, Raul
    Ferri, Cesar
    Hernandez-Orallo, Jose
    Martinez-Plumed, Fernando
    Jose Ramirez-Quintana, Maria
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2018, 2018, 11160 : 55 - 65
  • [6] Data Synthesis for Testing Black-Box Machine Learning Models
    Saha, Diptikalyan
    Aggarwal, Aniya
    Hans, Sandeep
    [J]. PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 110 - 114
  • [7] The Black-Box Syndrome: Embracing Randomness in Machine Learning Models
    Anthis, Z.
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS AND DOCTORAL CONSORTIUM, PT II, 2022, 13356 : 3 - 9
  • [8] Global and local interpretability techniques of supervised machine learning black box models for numerical medical data
    Hakkoum, Hajar
    Idri, Ali
    Abnane, Ibtissam
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 131
  • [9] IoT Botnet Detection using Black-box Machine Learning Models : the Trade-off between Performance and Interpretability
    Ben Rabah, Nourhene
    Le Grand, Benedicte
    Pinheiro, Manuele Kirsch
    [J]. 2021 IEEE 30TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE 2021), 2021, : 101 - 106
  • [10] Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models
    Krause, Josua
    Perer, Adam
    Ng, Kenney
    [J]. 34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 5686 - 5697