Interactive visual formula composition of multidimensional data classifiers

被引:0
|
作者
Derstroff, Adrian [1 ]
Leistikow, Simon [1 ]
Nahardani, Ali [2 ]
Gruen, Katja [3 ]
Franz, Marcus [3 ]
Hoerr, Verena [2 ]
Linsen, Lars [1 ]
机构
[1] Univ Munster, Inst Comp Sci, Einsteinstra 62, D-48149 Munster, Nordrhein Westf, Germany
[2] Univ Hosp Bonn, Heart Ctr Bonn, Dept Internal Med 2, Bonn, Germany
[3] Jena Univ Hosp, Dept Internal Med 1, Div Cardiol Angiol Pneumol & Intens Med Care, Jena, Germany
关键词
Classification; feature space; formulas; multidimensional data; visual analysis; FEATURE-SELECTION; RELEVANCE;
D O I
10.1177/14738716241270288
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Understanding how a classification result is generated and what role individual features play in the classification is crucial in many applications and, in particular, in medical contexts such as the translation of diagnosis biomarkers into clinical practice. The goal is to find (ideally simple) relationships between the features in multi-dimensional data and the classification for an explanation of the underlying phenomenon. Mathematical formulas allow for the expression of these relationships and can serve as classifiers. However, there are infinitely many mathematical formulas for the given features and they bear an inherent trade-off between complexity and accuracy. We present an interactive visual approach that supports domain experts to mitigate the trade-off issue. Core to our approach is a novel feature selection method, from which formulas are composed using symbolic regression and where state-of-the-art classifiers serve as a reference. To evaluate our approach and compare the achieved classification performance to the performance achieved by other state-of-the-art feature selection techniques, we test our methods with well-known machine learning data sets. Our evaluation shows that our feature selection method performs better than randomly selecting features for data sets with many features or when a low number of generations in the symbolic regression is required. Moreover, it consistently matches or outperforms state-of-the-art methods. Moreover, we apply our approach in a case study to a hemodynamic cohort data set, where we report our findings and domain expert feedback. Our approach was able to find formulas containing features that are in agreement with literature. Also, we could find formulas that performed better in the micro-averaged F1 score when compared to established histological indices.
引用
收藏
页码:42 / 61
页数:20
相关论文
共 50 条
  • [21] A formula for multiple classifiers in data mining based on Brandt semigroups
    A. V. Kelarev
    J. L. Yearwood
    M. A. Mammadov
    Semigroup Forum, 2009, 78
  • [22] A formula for multiple classifiers in data mining based on Brandt semigroups
    Kelarev, A. V.
    Yearwood, J. L.
    Mammadov, M. A.
    SEMIGROUP FORUM, 2009, 78 (02) : 293 - 309
  • [23] Visual Single Cluster of Multidimensional Data
    Ameur, Khadidja
    Benblidia, Nadjia
    Oukid-Khouas, Saliha
    2012 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND INDUSTRIAL INFORMATICS (ICCSII), 2012,
  • [24] Visual analysis of the multidimensional meteorological data
    Dzemyda, G
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 1, PROCEEDINGS, 2004, 3036 : 652 - 656
  • [25] Visual modeling in an analysis of multidimensional data
    Zakharova, A. A.
    Vekhter, E. V.
    Shklyar, A. V.
    Pak, A. J.
    XI INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE - APPLIED MECHANICS AND DYNAMICS SYSTEMS, 2018, 944
  • [26] iSPLOM: Interactive with Scatterplot Matrix for Exploring Multidimensional Data
    Tran Van Long
    KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 1, 2014, 244 : 175 - 186
  • [27] Interactive feature space extension for multidimensional data projection
    Perez, Daniel
    Zhang, Leishi
    Schaefer, Matthias
    Schreck, Tobias
    Keim, Daniel
    Diaz, Ignacio
    NEUROCOMPUTING, 2015, 150 : 611 - 626
  • [28] MILVA: An interactive tool for the exploration of multidimensional microarray data
    D'Alimonte, D
    Lowe, D
    Nabney, IT
    Mersinias, V
    Smith, CP
    BIOINFORMATICS, 2005, 21 (22) : 4192 - 4193
  • [29] Suggesting Assess Queries for Interactive Analysis of Multidimensional Data
    Francia, Matteo
    Golfarelli, Matteo
    Marcel, Patrick
    Rizzi, Stefano
    Vassiliadis, Panos
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 6421 - 6434
  • [30] Interactive maps for visual data exploration
    Andrienko, GL
    Andrienko, NV
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 1999, 13 (04) : 355 - 374