Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification

被引:0
|
作者
Hirsch, Vitali [2 ]
Reimann, Peter [1 ]
Treder-Tschechlov, Dennis [2 ]
Schwarz, Holger [2 ]
Mitschang, Bernhard [2 ]
机构
[1] Univ Stuttgart, Grad Sch Excellence Adv Mfg Engn GSaME, Stuttgart, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst IPVS, Stuttgart, Germany
来源
VLDB JOURNAL | 2023年 / 32卷 / 05期
关键词
Classification; Domain knowledge; Multi-class imbalance; Heterogeneous feature space; SYSTEMS; SKIN;
D O I
10.1007/s00778-023-00780-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.
引用
收藏
页码:1037 / 1064
页数:28
相关论文
共 50 条
  • [41] A sequential model for multi-class classification
    Even-Zohar, Y
    Roth, D
    [J]. PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, : 10 - 19
  • [42] Novel approach to multi-class classification
    Fang, Y
    Qi, FH
    [J]. JOURNAL OF INFRARED AND MILLIMETER WAVES, 2004, 23 (06) : 418 - 422
  • [43] Multi-class Sentiment Classification on Weibo
    Tian Xian-yun
    Yu Guang
    Li Peng-yu
    [J]. 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING - 22ND ANNUAL CONFERENCE PROCEEDINGS, VOLS I AND II, 2015, : 90 - 97
  • [44] Bayes covariant multi-class classification
    Such, Ondrej
    Barreda, Santiago
    [J]. PATTERN RECOGNITION LETTERS, 2016, 84 : 99 - 106
  • [45] multi-imbalance: Open Source Python']Python Toolbox for Multi-class Imbalanced Classification
    Grycza, Jacek
    Horna, Damian
    Klimczak, Hanna
    Lango, Mateusz
    Plucinski, Kamil
    Stefanowski, Jerzy
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 546 - 549
  • [46] On multi-class classification by way of niching
    McIntyre, AR
    Heywood, MI
    [J]. GENETIC AND EVOLUTIONARY COMPUTATION GECCO 2004 , PT 2, PROCEEDINGS, 2004, 3103 : 581 - 592
  • [47] An algebraic multi-class classification method
    He, Q
    Liu, ZY
    Shi, ZZ
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3307 - 3312
  • [48] Multi-class imbalance problem: A multi-objective solution
    He, Yi-Xiao
    Liu, Dan-Xuan
    Lyu, Shen-Huan
    Qian, Chao
    Zhou, Zhi-Hua
    [J]. INFORMATION SCIENCES, 2024, 680
  • [49] Parzen windows for multi-class classification
    Pan, Zhi-Wei
    Xiang, Dao-Hong
    Xiao, Quan-Wu
    Zhou, Ding-Xuan
    [J]. JOURNAL OF COMPLEXITY, 2008, 24 (5-6) : 606 - 618
  • [50] Towards exploiting linear regression for multi-class/multi-label classification: an empirical analysis
    Jia, Bin-Bin
    Liu, Jun-Ying
    Zhang, Min-Ling
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (09) : 3671 - 3700