A survey on addressing high-class imbalance in big data

被引:426
|
作者
Leevy J.L. [1 ]
Khoshgoftaar T.M. [1 ]
Bauder R.A. [1 ]
Seliya N. [2 ]
机构
[1] Florida Atlantic University, Boca Raton
[2] Ohio Northern University, Ada
基金
美国国家科学基金会;
关键词
Big data; Cost-sensitive learners; Data sampling; High-class imbalance;
D O I
10.1186/s40537-018-0151-6
中图分类号
学科分类号
摘要
In a majority–minority classification problem, class imbalance in the dataset(s) can dramatically skew the performance of classifiers, introducing a prediction bias for the majority class. Assuming the positive (minority) class is the group of interest and the given application domain dictates that a false negative is much costlier than a false positive, a negative (majority) class prediction bias could have adverse consequences. With big data, the mitigation of class imbalance poses an even greater challenge because of the varied and complex structure of the relatively much larger datasets. This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance (i.e., a majority-to-minority class ratio between 100:1 and 10,000:1) in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance. In this paper, two techniques are covered which include Data-Level (e.g., data sampling) and Algorithm-Level (e.g., cost-sensitive and hybrid/ensemble) Methods. Data sampling methods are popular in addressing class imbalance, with Random Over-Sampling methods generally showing better overall results. At the Algorithm-Level, there are some outstanding performers. Yet, in the published studies, there are inconsistent and conflicting results, coupled with a limited scope in evaluated techniques, indicating the need for more comprehensive, comparative studies. © 2018, The Author(s).
引用
收藏
相关论文
共 50 条
  • [21] Addressing Class Imbalance in Software Quality Modeling
    Seliya, Naeem
    Khoshgoftaar, Taghi N.
    14TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2008, : 137 - +
  • [22] PENTHOUSE - HIGH-CLASS IMAGERY OF A MENS MAGAZINE
    GIBBONS, D
    GRAPHIS, 1980, 35 (206): : 524 - &
  • [23] The high-class concept in the culture - a legitimizing concept?
    Sellier, Veronika
    DU, 2009, (801): : 114 - 115
  • [24] BENN: Balanced Ensemble Neural Network for Handling Class Imbalance in Big Data
    Ramesh, Sneha Halebeedu
    Basava, Annappa
    Perumal, Sankar Pariserum
    EXPERT SYSTEMS, 2025, 42 (02)
  • [25] A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2021, 13 (02)
  • [26] High-Class Evaluation Method Based on Game Situation
    Zhou, Wei
    Liu, Minghui
    Guan, Shouping
    2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 3195 - 3200
  • [27] Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing
    Henning, Sophie
    Beluch, William
    Fraser, Alexander
    Friedrich, Annemarie
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 523 - 540
  • [28] Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification
    Ganganwar, Vaishali
    Rajalakshmi, Ratnavel
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2024, 8 (02) : 167 - 188
  • [29] A literature survey on various aspect of class imbalance problem in data mining
    Goswami, Shivani
    Singh, Anil Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (27) : 70025 - 70050
  • [30] THEFT OF SOHO + ARTIST COLONY TO HIGH-CLASS RESIDENTIAL
    BOWDEN, AS
    ARTNEWS, 1979, 78 (09): : 82 - &