Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

被引:0
|
作者
Hirsch, Vitali [2 ]
Reimann, Peter [1 ]
Treder-Tschechlov, Dennis [2 ]
Schwarz, Holger [2 ]
Mitschang, Bernhard [2 ]
机构
[1] Univ Stuttgart, Grad Sch Excellence Adv Mfg Engn GSaME, Stuttgart, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst IPVS, Stuttgart, Germany
来源
VLDB JOURNAL | 2023年 / 32卷 / 05期
关键词
Classification; Domain knowledge; Multi-class imbalance; Heterogeneous feature space; SYSTEMS; SKIN;
D O I
10.1007/s00778-023-00780-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.
引用
收藏
页码:1037 / 1064
页数:28
相关论文
共 50 条
  • [1] Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    Vitali Hirsch
    Peter Reimann
    Dennis Treder-Tschechlov
    Holger Schwarz
    Bernhard Mitschang
    [J]. The VLDB Journal, 2023, 32 : 1037 - 1064
  • [2] Exploiting Domain Knowledge to address Multi-Class Imbalance and a Heterogeneous Feature Space in Classification Tasks for Manufacturing Data
    Hirsch, Vitali
    Reimann, Peter
    Mitschang, Bernhard
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3258 - 3271
  • [3] ADDRESSING CLASS IMBALANCE IN MULTI-CLASS IMAGE CLASSIFICATION BY MEANS OF AUXILIARY FEATURE SPACE RESTRICTIONS
    Dorozynski, M.
    Rottensteiner, F.
    [J]. XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION II, 2022, 43-B2 : 777 - 785
  • [4] Exploiting Domain Knowledge to Address Class Imbalance in Meteorological Data Mining
    Tsagalidis, Evangelos
    Evangelidis, Georgios
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [5] Multi-class Heterogeneous Domain Adaptation
    Zhou, Joey Tianyi
    Tsang, Ivor W.
    Pan, Sinno Jialin
    Tan, Mingkui
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [6] Multi-class feature selection for texture classification
    Chen, Xue-wen
    Zeng, Xiangyan
    van Alphen, Deborah
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (14) : 1685 - 1691
  • [7] Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter
    Talpur, Bandeh Ali
    O'Sullivan, Declan
    [J]. INFORMATICS-BASEL, 2020, 7 (04):
  • [8] SEQUENTIAL HETEROGENEOUS FEATURE SELECTION FOR MULTI-CLASS CLASSIFICATION: APPLICATION IN GOVERNMENT 2.0
    Nazar, Imara
    Liyanage, Yasitha Warahena
    Zois, Daphney-Stavroula
    Chelmis, Charalampos
    [J]. PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [9] Tackling the Problem of Class Imbalance in Multi-class Sentiment Classification: An Experimental Study
    Lango, Mateusz
    [J]. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2019, 44 (02) : 151 - 178
  • [10] Steganographic domain classification using multi-class
    Xu Bo
    Wang Jiazhen
    Liu Xiaqin
    Yang Sumin
    [J]. ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 1270 - 1273