The Impact of Variable Selection and Transformation on the Interpretability and Accuracy of Fuzzy Models

被引：0

作者：

Fuchs, Caro ^{[1
,2
]}

Spolaor, Simone ^{[3
]}

Kaymak, Uzay ^{[1
]}

Nobile, Marco S. ^{[2
,4
,5
]}

机构：

[1] Eindhoven Univ Technol, Jheronimus Acad Data Sci, sHertogenbosch, Netherlands

[2] Eindhoven Univ Technol, Dept Ind Engn & Innovat Sci, Eindhoven, Netherlands

[3] Eindhoven Univ Technol, Dept Mech Engn, Microsyst, Eindhoven, Netherlands

[4] Ca Foscari Univ Venice, Dept Environm Sci Informat & Stat, Venice, Italy

[5] Bicocca Bioinformat Biostat & Bioimaging Ctr B4, Milan, Italy

来源：

2022 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (IEEE CIBCB 2022) | 2022年

基金：

欧盟地平线“2020”;

关键词：

interpretable AI; data transformation; log-transformation; data normalization; machine learning; genetic algorithm; fuzzy model; fuzzy logic;

D O I：

10.1109/CIBCB55180.2022.9863019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data transformation is an important step in Machine Learning pipelines which can strongly improve their performance. For instance, min-max normalization is often used to make all variables lie in the same range, while log-transformation is used to map data that is scattered across several orders of magnitude to a logarithmic space. Such transformations can be beneficial when the machine learning approach measures distance in a metric space, such as cluster-based approaches. These two transformation approaches can be combined to reveal hidden patterns in the data in the case of log-normally distributed data points, which commonly occur in biological and medical data. In this work we introduce a novel evolutionary approach designed to automatically determine the optimal log-transformation and selection of variables. Our approach is built around an interpretable AI system (created by pyFUME), so that all transformations are followed by inverse transformations to map back the values into the original universe of discourse, and preserve the interpretability of the results. We test our approach on two synthetic datasets, designed to reproduce a condition in which some variables are normally distributed, some variables are log-normally distributed, and some variables are just noise in the dataset. Our results show that our approach yields better performing models compared to conventional methods, and that the resulting model is also characterised by a better interpretability, making such approach particularly useful to study biomedical datasets.

引用

页码：155 / 162

页数：8

共 50 条

[41] Generating fuzzy models from deep knowledge: Robustness and interpretability issues
Guglielmann, R
Ironi, L
SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, PROCEEDINGS, 2005, 3571 : 600 - 612
[42] A new fuzzy membership function with applications in interpretability improvement of neurofuzzy models
Gan, John Q.
Zhou, Shang-Ming
COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 183 - 194
[43] Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning
Department of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Sakai, Osaka, 599-8531, Japan
International Journal of Approximate Reasoning, 2007, 44 (01): : 4 - 31
[44] Assessing the impact of program selection on the accuracy of 3D geologic models
MacCormack, Kelsey E.
Eyles, Carolyn H.
GEOSPHERE, 2012, 8 (02): : 534 - 543
[45] Knowledge base to fuzzy information granule: A review from the interpretability-accuracy perspective
Ahmed, Md. Manjur
Isa, Nor Ashidi Mat
APPLIED SOFT COMPUTING, 2017, 54 : 121 - 140
[46] Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning
Ishibuchi, Hisao
Nojima, Yusuke
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2007, 44 (01) : 4 - 31
[47] Randomizing outputs to increase variable selection accuracy
Zhang, Chun-Xia
Ji, Nan-Nan
Wang, Guan-Wei
NEUROCOMPUTING, 2016, 218 : 91 - 102
[48] Tradeoff search methods between interpretability and accuracy of the identification fuzzy systems based on rules
Yankovskaya A.E.
Gorbunov I.V.
Hodashinsky I.A.
Pattern Recognition and Image Analysis, 2017, 27 (02) : 243 - 265
[49] Variable selection in regression models used to analyse Global Positioning System accuracy in forest environments
Ordonez, Celestino
Sestelo, Marta
Roca-Pardinas, Javier
Covian, Enrique
APPLIED MATHEMATICS AND COMPUTATION, 2012, 219 (04) : 2220 - 2230
[50] Accuracy vs. Interpretability of Fuzzy Rule-Based Classifiers: An Evolutionary Approach
Gorzalczany, Marian B.
Rudzinski, Filip
SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 222 - 230

← 1 2 3 4 5 →