On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

被引:18
|
作者
Drobnic, Franc [1 ]
Kos, Andrej [1 ]
Pustisek, Matevz [1 ]
机构
[1] Univ Ljubljana, Fac Elect Engn, Trzaska Cesta 25, Ljubljana 1000, Slovenia
关键词
interpretable machine learning; feature multicollinearity; random forests; feature selection; feature importance; greedy feature selection;
D O I
10.3390/electronics9050761
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of "black box" character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model's feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Regularized Feature Selection in Categorical PLS for Multicollinear Data
    Mehmood, Tahir
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [2] Scalable Machine Learning with Granulated Data Summaries: A Case of Feature Selection
    Chadzynska-Krasowska, Agnieszka
    Betlinski, PaweL
    Slezak, Dominik
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 519 - 529
  • [3] Feature Selection of Photoplethysmograph Data in Machine Learning
    Haq, Faris Atoil
    Sarno, Riyanarto
    Abdillah, Rifqi
    Amri, Taufiq Choirul
    Septiyanto, Abdullah Faqih
    Sungkono, Kelly Rossa
    [J]. 2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 315 - 320
  • [4] CASE STUDIES WITH SAR DATA FOR ASSESSING THE UTILITY OF MANUAL FEATURE SELECTION IN MACHINE LEARNING
    Gray, Kyle
    Mitchell, Thomas
    Schwartzkopf, Wade
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 750 - 753
  • [5] Feature Selection in Machine Learning Models for Road Accident Severity
    Al-Turaiki, Isra
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (03): : 77 - 82
  • [6] Feature selection with prior knowledge improves interpretability of chemometrics models
    des Touches, Thomas
    Munda, Marco
    Cornet, Thomas
    Gerkens, Pascal
    Hellepute, Thibault
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 240
  • [7] Machine learning and feature selection for the analysis of Alzheimer Metabolomics Data
    Belacel, Nabil
    Cuperlovic-Culf, Miroslava
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 222 - 226
  • [8] Feature Extraction, Feature Selection and Machine Learning for Image Classification: A Case Study
    Popescu, Madalina Cosmina
    Sasu, Lucian Mircea
    [J]. 2014 INTERNATIONAL CONFERENCE ON OPTIMIZATION OF ELECTRICAL AND ELECTRONIC EQUIPMENT (OPTIM), 2014, : 968 - 973
  • [9] Interpretability and Explainability of Machine Learning Models: Achievements and Challenges
    Henriques, J.
    Rocha, T.
    de Carvalho, P.
    Silva, C.
    Paredes, S.
    [J]. INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2022, ICBHI 2022, 2024, 108 : 81 - 94
  • [10] Measuring Interpretability for Different Types of Machine Learning Models
    Zhou, Qing
    Liao, Fenglu
    Mou, Chao
    Wang, Ping
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING: PAKDD 2018 WORKSHOPS, 2018, 11154 : 295 - 308