Learning from Imbalanced Data

被引:5569
|
作者
He, Haibo [1 ]
Garcia, Edwardo A. [1 ]
机构
[1] Stevens Inst Technol, Dept Elect & Comp Engn, Hoboken, NJ 07030 USA
关键词
Imbalanced learning; classification; sampling methods; cost-sensitive learning; kernel-based learning; active learning; assessment metrics; SUPPORT VECTOR MACHINES; CLASSIFICATION; RECOGNITION; SVM; ONLINE;
D O I
10.1109/TKDE.2008.239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the continuous expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews. Due to the inherent complex characteristics of imbalanced data sets, learning from such data requires new understandings, principles, algorithms, and tools to transform vast amounts of raw data efficiently into information and knowledge representation. In this paper, we provide a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.
引用
收藏
页码:1263 / 1284
页数:22
相关论文
共 50 条
  • [1] Metric Learning from Imbalanced Data
    Gautheron, Leo
    Habrard, Amaury
    Morvant, Emilie
    Sebban, Marc
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 923 - 930
  • [2] Online Continual Learning from Imbalanced Data
    Chrysakis, Aristotelis
    Moens, Marie-Francine
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] Online Continual Learning from Imbalanced Data
    Chrysakis, Aristotelis
    Moens, Marie-Francine
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [4] The Impact of Local Data Characteristics on Learning from Imbalanced Data
    Stefanowski, Jerzy
    [J]. ROUGH SETS AND INTELLIGENT SYSTEMS PARADIGMS, RSEISP 2014, 2014, 8537 : 1 - 13
  • [5] Learning Patterns from Imbalanced Evolving Data Streams
    Almuammar, Manal
    Fasli, Maria
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2048 - 2057
  • [6] Evolutionary Online Machine Learning from Imbalanced Data
    Stein, Anthony
    [J]. 2016 IEEE 1ST INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2016, : 281 - 286
  • [7] Metric Learning from Imbalanced Data with Generalization Guarantees
    Gautheron, Leo
    Habrard, Amaury
    Morvant, Emilie
    Sebban, Marc
    [J]. PATTERN RECOGNITION LETTERS, 2020, 133 : 298 - 304
  • [8] SetConv: A New Approach for Learning from Imbalanced Data
    Gao, Yang
    Li, Yi-Fan
    Lin, Yu
    Aggarwal, Charu
    Khan, Latifur
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1284 - 1294
  • [9] Positive-Unlabeled Learning from Imbalanced Data
    Su, Guangxin
    Chen, Weitong
    Xu, Miao
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2995 - 3001
  • [10] Evaluation of Sampling Methods for Learning from Imbalanced Data
    Goel, Garima
    Maguire, Liam
    Li, Yuhua
    McLoone, Sean
    [J]. INTELLIGENT COMPUTING THEORIES, 2013, 7995 : 392 - 401