Survey on Highly Imbalanced Multi-class Data

被引:0
|
作者
Hamid, Hakim Abdul [1 ,2 ]
Yusoff, Marina [3 ]
Mohamed, Azlinah [3 ]
机构
[1] Univ Teknikal Malaysia Melaka UTeM, INSFORNET, C ACT, Hang Tuah Jaya, Melaka, Malaysia
[2] Univ Teknikal Malaysia Melaka UTeM, Fak Teknol Maklumat & Komunikasi FTMK, Hang Tuah Jaya, Melaka, Malaysia
[3] Univ Teknol MARA UiTM, Inst Big Data Analyt & Artificial Intelligence, Shah Alam, Selangor, Malaysia
关键词
Imbalanced data; highly imbalanced data; highly imbalanced multi-class; data strategies; FEATURE-SELECTION; CLASSIFICATION SYSTEMS; ENSEMBLE SELECTION; CREDIT RISK; DATA-SETS; MACHINE; MODELS; ALGORITHM; NETWORKS; DIMENSIONALITY;
D O I
10.14569/IJACSA.2022.0130627
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning technology has a massive impact on society because it offers solutions to solve many complicated problems like classification, clustering analysis, and predictions, especially during the COVID-19 pandemic. Data distribution in machine learning has been an essential aspect in providing unbiased solutions. From the earliest literatures published on highly imbalanced data until recently, machine learning research has focused mostly on binary classification data problems. Research on highly imbalanced multi-class data is still greatly unexplored when the need for better analysis and predictions in handling Big Data is required. This study focuses on reviews related to the models or techniques in handling highly imbalanced multi-class data, along with their strengths and weaknesses and related domains. Furthermore, the paper uses the statistical method to explore a case study with a severely imbalanced dataset. This article aims to (1) understand the trend of highly imbalanced multi-class data through analysis of related literatures; (2) analyze the previous and current methods of handling highly imbalanced multi-class data; (3) construct a framework of highly imbalanced multi-class data. The chosen highly imbalanced multi-class dataset analysis will also be performed and adapted to the current methods or techniques in machine learning, followed by discussions on open challenges and the future direction of highly imbalanced multi-class data. Finally, for highly imbalanced multi-class data, this paper presents a novel framework. We hope this research can provide insights on the potential development of better methods or techniques to handle and manipulate highly imbalanced multiclass data.
引用
收藏
页码:211 / 229
页数:19
相关论文
共 50 条
  • [1] A survey of multi-class imbalanced data classification methods
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2471 - 2501
  • [2] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [3] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    [J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614
  • [4] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322
  • [5] Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data
    Vong, Chi-Man
    Du, Jie
    [J]. NEURAL NETWORKS, 2020, 128 : 268 - 278
  • [6] An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data
    Wojciechowski, Szymon
    Wilk, Szymon
    Stefanowski, Jerzy
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2017, 2018, 578 : 238 - 247
  • [7] A Dynamic Sampling Framework for Multi-Class Imbalanced Data
    Debowski, B.
    Areibi, S.
    Grewal, G.
    Tempelman, J.
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 113 - 118
  • [8] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [9] Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
    Mondragon, Julio Cesar Munguia
    Lara, Erendira Rendon
    Eleuterio, Roberto Alejo
    Gutirrez, Everardo Efren Granda
    Lopez, Federico Del Razo
    [J]. MATHEMATICS, 2023, 11 (18)
  • [10] A Combination Method for Multi-Class Imbalanced Data Classification
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 365 - 368