On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

被引:18
|
作者
Drobnic, Franc [1 ]
Kos, Andrej [1 ]
Pustisek, Matevz [1 ]
机构
[1] Univ Ljubljana, Fac Elect Engn, Trzaska Cesta 25, Ljubljana 1000, Slovenia
关键词
interpretable machine learning; feature multicollinearity; random forests; feature selection; feature importance; greedy feature selection;
D O I
10.3390/electronics9050761
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of "black box" character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model's feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Data Classification Using Feature Selection And kNN Machine Learning Approach
    Begum, Shemim
    Chakraborty, Debasis
    Sarkar, Ram
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 811 - 814
  • [22] Feature Selection in Pulmonary Function Test Data with Machine Learning Methods
    Karakis, Rukiye
    Guler, Inan
    Isik, Ali Hakan
    [J]. 2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [23] Data Driven Feature Selection for Machine Learning Algorithms in Computer Vision
    Zhang, Fan
    Li, Wei
    Zhang, Yifan
    Feng, Zhiyong
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (06): : 4262 - 4272
  • [24] Probabilistic Feature Selection in Machine Learning
    Ghosh, Indrajit
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 623 - 632
  • [25] A Deep Feature Learning Model for Pneumonia Detection Applying a Combination of mRMR Feature Selection and Machine Learning Models
    Togacar, M.
    Ergen, B.
    Comert, Z.
    Ozyurt, F.
    [J]. IRBM, 2020, 41 (04) : 212 - 222
  • [26] A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning
    Buyukkececi, Mustafa
    Okur, Mehmet Cudi
    [J]. GAZI UNIVERSITY JOURNAL OF SCIENCE, 2023, 36 (04): : 1506 - 1520
  • [27] Accuracy, Fairness, and Interpretability of Machine Learning Criminal Recidivism Models
    Ingram, Eric
    Gursoy, Furkan
    Kakadiaris, Ioannis A.
    [J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, BDCAT, 2022, : 233 - 241
  • [28] Feature fusion improves performance and interpretability of machine learning models in identifying soil pollution of potentially contaminated sites
    Lu, Xiaosong
    Du, Junyang
    Zheng, Liping
    Wang, Guoqing
    Li, Xuzhi
    Sun, Li
    Huang, Xinghua
    [J]. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2023, 259
  • [29] Applying Genetic Programming to Improve Interpretability in Machine Learning Models
    Ferreira, Leonardo Augusto
    Guimaraes, Frederico Gadelha
    Silva, Rodrigo
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [30] Approach to provide interpretability in machine learning models for image classification
    Anja Stadlhofer
    Vitaliy Mezhuyev
    [J]. Industrial Artificial Intelligence, 1 (1):