Learning from Imbalanced Data

被引:5569
|
作者
He, Haibo [1 ]
Garcia, Edwardo A. [1 ]
机构
[1] Stevens Inst Technol, Dept Elect & Comp Engn, Hoboken, NJ 07030 USA
关键词
Imbalanced learning; classification; sampling methods; cost-sensitive learning; kernel-based learning; active learning; assessment metrics; SUPPORT VECTOR MACHINES; CLASSIFICATION; RECOGNITION; SVM; ONLINE;
D O I
10.1109/TKDE.2008.239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.
引用
收藏
页码:1263 / 1284
页数:22
相关论文
共 50 条
  • [21] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077
  • [22] SelectNet: Learning to Sample from the Wild for Imbalanced Data Training
    Liu, Yunru
    Gao, Tingran
    Yang, Haizhao
    [J]. MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 107, 2020, 107 : 193 - 206
  • [23] Learning from imbalanced data: open challenges and future directions
    Krawczyk B.
    [J]. Progress in Artificial Intelligence, 2016, 5 (4) : 221 - 232
  • [24] Robust Multiclass Classification for Learning from Imbalanced Biomedical Data
    Piyaphol Phoungphol
    [J]. Tsinghua Science and Technology, 2012, 17 (06) : 619 - 628
  • [25] Enhancing techniques for learning decision trees from imbalanced data
    Ikram Chaabane
    Radhouane Guermazi
    Mohamed Hammami
    [J]. Advances in Data Analysis and Classification, 2020, 14 : 677 - 745
  • [26] Learning from Highly Imbalanced Big Data with Label Noise
    Johnson, Justin M.
    Kennedy, Robert K. L.
    Khoshgoftaar, Taghi M.
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (05)
  • [27] Learning from Imbalanced Data Using Methods of Sample Selection
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    [J]. 2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 256 - 259
  • [28] Loss Factors for Learning Boosting Ensembles from Imbalanced Data
    Soleymani, Roghayeh
    Granger, Eric
    Fumera, Giorgio
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 204 - 209
  • [29] Towards Deeper Insights into Deep Learning from Imbalanced Data
    Song, Jie
    Shen, Yun
    Jing, Yongcheng
    Song, Mingli
    [J]. COMPUTER VISION, PT I, 2017, 771 : 674 - 684
  • [30] Label matrix normalization for semisupervised learning from imbalanced Data
    Li, Fengqi
    Li, Guangming
    Yang, Nanhai
    Xia, Feng
    Yu, Chuang
    [J]. NEW REVIEW OF HYPERMEDIA AND MULTIMEDIA, 2014, 20 (01) : 5 - 23