Global-local information based oversampling for multi-class imbalanced data

被引:6
|
作者
Han, Mingming [1 ]
Guo, Husheng [1 ]
Li, Jinyan [3 ,4 ]
Wang, Wenjian [1 ,2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
[2] Shanxi Univ, Key Lab Computat Intelligence & Chinese Informat P, Minist Educ, Taiyuan 030006, Shanxi, Peoples R China
[3] Univ Technol Sydney, Adv Analyt Inst, Fac Engn, Broadway, NSW, Australia
[4] Univ Technol Sydney, IT, Broadway, NSW, Australia
基金
中国国家自然科学基金;
关键词
Oversampling; Intrinsic characteristics; Synthetic strategy; OVER-SAMPLING TECHNIQUE; DATA-SETS; SMOTE; CLASSIFICATION; ENSEMBLE;
D O I
10.1007/s13042-022-01746-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-class imbalanced classification is a challenging problem in the field of machine learning. Many methods have been proposed to deal with it, and oversampling is one of the most popular techniques which alleviates class imbalance by generating instances for the minority classes. However, each oversampling utilizes a single way to generate instances for all candidate minority ones, which neglects the intrinsic characteristics among different minority class instances, and makes the synthetic instances redundant or ineffective. In this work, we propose a global-local-based oversampling method, termed GLOS. We introduce a new discreteness-based metric (DID) and distinguish the minority class from the majority class by comparing it with each class-level discreteness value. Then, for each minority class, some difficult-to-learn instances are selected, which have smaller instance-level dispersion than the corresponding class-level one, to generate synthetic instances. And the number of synthetic instances equals the difference between two types of dispersion values. These selected instances are assigned into different groups according to their local distribution. Furthermore, GLOS adopts a specific synthetic strategy to each group instance purposefully. Finally, all minority classes, part of the majority classes instances, and synthetic data will be used as training data. In this way, the quantity and quality of synthetic instances are guaranteed. Experimental results on KEEL and UCI data sets demonstrate the effectiveness of our proposal.
引用
收藏
页码:2071 / 2086
页数:16
相关论文
共 50 条
  • [1] Global-local information based oversampling for multi-class imbalanced data
    Mingming Han
    Husheng Guo
    Jinyan Li
    Wenjian Wang
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 2071 - 2086
  • [2] An oversampling method for multi-class imbalanced data based on composite weights
    Deng, Mingyang
    Guo, Yingshi
    Wang, Chang
    Wu, Fuwei
    [J]. PLOS ONE, 2021, 16 (11):
  • [3] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    [J]. SENSORS, 2021, 21 (19)
  • [4] Multi-class Imbalanced Data Oversampling for Vertebral Column Pathologies Classification
    Saez, Jose A.
    Quintian, Hector
    Krawczyk, Bartosz
    Wozniak, Michal
    Corchado, Emilio
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 131 - 142
  • [5] Adversarial oversampling for multi-class imbalanced data classification with convolutional neural networks
    Wojciechowski, Adam
    Lango, Mateusz
    [J]. FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183, 2022, 183 : 98 - 111
  • [6] Selecting local ensembles for multi-class imbalanced data classification
    Krawczyk, Bartosz
    Cano, Alberto
    Wozniak, Michal
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [7] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [8] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    [J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614
  • [9] An active learning budget-based oversampling approach for partially labeled multi-class imbalanced data streams
    Aguiar, Gabriel J.
    Cano, Alberto
    [J]. 38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 382 - 389
  • [10] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322