Learning from class-imbalanced data: Review of methods and applications

被引:1277
|
作者
Guo Haixiang [1 ,2 ,3 ]
Li Yijing [1 ,2 ]
Shang, Jennifer [4 ]
Gu Mingyun [1 ]
Huang Yuanyue [1 ]
Bing, Gong [5 ]
机构
[1] China Univ Geosci, Coll Econ & Management, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Res Ctr Digital Business Management, Wuhan 430074, Peoples R China
[3] China Univ Geosci WUHAN, Mineral Resource Strategy & Policy Res Ctr, Wuhan 43007, Peoples R China
[4] Univ Pittsburgh, Joseph M Katz Grad Sch Business, Pittsburgh, PA 15260 USA
[5] Univ Politecn Madrid, ETS Ind Engn, Dept Ind Engn Business Adm & Stat, C Jose Gutierrez Abascal,2, Madrid 20086, Spain
基金
中国国家自然科学基金;
关键词
Rare events; lmbalanced data; Machine learning; Data mining; SUPPORT VECTOR MACHINE; COST-SENSITIVE CLASSIFIER; CARD FRAUD DETECTION; OIL-SPILL DETECTION; FEATURE-SELECTION; DATA-SETS; NEURAL-NETWORKS; CAUSE IDENTIFICATION; CHURN PREDICTION; FAULT-DIAGNOSIS;
D O I
10.1016/j.eswa.2016.12.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rare events, especially those that could potentially negatively impact society, often require humans' decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from, the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:220 / 239
页数:20
相关论文
共 50 条
  • [1] Learning from class-imbalanced data: review of data driven methods and algorithm driven methods
    Huang, Cui Yin
    Dai, Hong Liang
    [J]. DATA SCIENCE IN FINANCE AND ECONOMICS, 2021, 1 (01): : 21 - 36
  • [2] Learning from class-imbalanced data in wireless sensor networks
    Radivojac, P
    Korad, U
    Sivalingam, KM
    Obradovic, Z
    [J]. 2003 IEEE 58TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS1-5, PROCEEDINGS, 2003, : 3030 - 3034
  • [3] Methods for class-imbalanced learning with support vector machines: a review and an empirical evaluation
    Rezvani, Salim
    Pourpanah, Farhad
    Lim, Chee Peng
    Wu, Q. M. Jonathan
    [J]. Soft Computing, 2024, 28 (20) : 11873 - 11894
  • [4] Learning Fairly With Class-Imbalanced Data for Interference Coordination
    Guo, Jia
    Xu, Zhaoqi
    Yang, Chenyang
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (07) : 7176 - 7181
  • [5] Margin calibration in SVM class-imbalanced learning
    Yang, Chan-Yun
    Yang, Jr-Syu
    Wang, Jian-Jun
    [J]. NEUROCOMPUTING, 2009, 73 (1-3) : 397 - 411
  • [6] Prototypical Classifier for Robust Class-Imbalanced Learning
    Wei, Tong
    Shi, Jiang-Xin
    Li, Yu-Feng
    Zhang, Min-Ling
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 44 - 57
  • [7] Exploring of clustering algorithm on class-imbalanced data
    Li Xuan
    Chen Zhigang
    Yang Fan
    [J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 89 - 93
  • [8] RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets
    Hassanat, Ahmad B.
    Tarawneh, Ahmad S.
    Abed, Samer Subhi
    Altarawneh, Ghada Awad
    Alrashidi, Malek
    Alghamdi, Mansoor
    [J]. ELECTRONICS, 2022, 11 (02)
  • [9] Learning from class-imbalanced data using misclassification-focusing generative adversarial networks
    Yun, Jaesub
    Lee, Jong-Seok
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240
  • [10] Learning from Class-imbalanced Data with a Model-Agnostic Framework for Machine Intelligent Diagnosis
    Wu, Jingyao
    Zhao, Zhibin
    Sun, Chuang
    Yan, Ruqiang
    Chen, Xuefeng
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2021, 216