Types of minority class examples and their influence on learning classifiers from imbalanced data

被引:1
|
作者
Krystyna Napierala
Jerzy Stefanowski
机构
[1] Poznan University of Technology,Institute of Computing Science
关键词
Class-imbalanced data; Learning classifiers; Data difficulty factors; Local analysis; k-nearest neighbourhood;
D O I
暂无
中图分类号
学科分类号
摘要
Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate classification performance of popular classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning classifiers and pre-processing methods.
引用
收藏
页码:563 / 597
页数:34
相关论文
共 50 条
  • [1] Types of minority class examples and their influence on learning classifiers from imbalanced data
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 563 - 597
  • [2] Identification of Different Types of Minority Class Examples in Imbalanced Data
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 139 - 150
  • [3] Weighted One-Class Classification for Different Types of Minority Class Examples in Imbalanced Data
    Krawczyk, Bartosz
    Wozniak, Michal
    Herrera, Francisco
    [J]. 2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2014, : 337 - 344
  • [4] Oversampling With Reliably Expanding Minority Class Regions for Imbalanced Data Learning
    Zhu, Tuanfei
    Liu, Xinwang
    Zhu, En
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) : 6167 - 6181
  • [5] Imbalanced data learning by minority class augmentation using capsule adversarial networks
    Shamsolmoali, Pourya
    Zareapoor, Masoumeh
    Shen, Linlin
    Sadka, Abdul Hamid
    Yang, Jie
    [J]. NEUROCOMPUTING, 2021, 459 : 481 - 493
  • [6] Imbalanced data learning by minority class augmentation using capsule adversarial networks
    Shamsolmoali, Pourya
    Zareapoor, Masoumeh
    Shen, Linlin
    Sadka, Abdul Hamid
    Yang, Jie
    [J]. Neurocomputing, 2021, 459 : 481 - 493
  • [7] Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
    Napierala, Krystyna
    Stefanowski, Jerzy
    Wilk, Szymon
    [J]. ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 158 - 167
  • [8] Minority Class Oriented Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9920 - 9927
  • [9] Imbalanced Deep Learning by Minority Class Incremental Rectification
    Dong, Qi
    Gong, Shaogang
    Zhu, Xiatian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (06) : 1367 - 1381
  • [10] Machine-learning classifiers for imbalanced tornado data
    Trafalis T.B.
    Adrianto I.
    Richman M.B.
    Lakshmivarahan S.
    [J]. Computational Management Science, 2014, 11 (4) : 403 - 418