A survey on addressing high-class imbalance in big data

被引:426
|
作者
Leevy J.L. [1 ]
Khoshgoftaar T.M. [1 ]
Bauder R.A. [1 ]
Seliya N. [2 ]
机构
[1] Florida Atlantic University, Boca Raton
[2] Ohio Northern University, Ada
基金
美国国家科学基金会;
关键词
Big data; Cost-sensitive learners; Data sampling; High-class imbalance;
D O I
10.1186/s40537-018-0151-6
中图分类号
学科分类号
摘要
In a majority–minority classification problem, class imbalance in the dataset(s) can dramatically skew the performance of classifiers, introducing a prediction bias for the majority class. Assuming the positive (minority) class is the group of interest and the given application domain dictates that a false negative is much costlier than a false positive, a negative (majority) class prediction bias could have adverse consequences. With big data, the mitigation of class imbalance poses an even greater challenge because of the varied and complex structure of the relatively much larger datasets. This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance (i.e., a majority-to-minority class ratio between 100:1 and 10,000:1) in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance. In this paper, two techniques are covered which include Data-Level (e.g., data sampling) and Algorithm-Level (e.g., cost-sensitive and hybrid/ensemble) Methods. Data sampling methods are popular in addressing class imbalance, with Random Over-Sampling methods generally showing better overall results. At the Algorithm-Level, there are some outstanding performers. Yet, in the published studies, there are inconsistent and conflicting results, coupled with a limited scope in evaluated techniques, indicating the need for more comprehensive, comparative studies. © 2018, The Author(s).
引用
收藏
相关论文
共 50 条
  • [31] Addressing barriers to big data
    Alharthi, Abdulkhaliq
    Krotov, Vlad
    Bowman, Michael
    BUSINESS HORIZONS, 2017, 60 (03) : 285 - 292
  • [32] Modeling of "high-class feeling" on a cosmetic package design
    Tobitani K.
    Shiraiwa A.
    Katahira K.
    Nagata N.
    Nikata K.
    Arakawa K.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2021, 87 (01): : 134 - 139
  • [33] Training Mode and Quality View of High-Class Talents
    Gao, Xia
    Wang, Yufang
    Lou, Bingna
    International Journal of Emerging Technologies in Learning, 2022, 17 (13): : 186 - 199
  • [34] THEY SAID YOU WAS HIGH-CLASS + SOCIAL-MOBILITY AND THE CHANGING CLASS SYSTEM
    ARISTIDES
    AMERICAN SCHOLAR, 1986, 55 (02): : 151 - &
  • [35] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2019, 10 (04) : 824 - 835
  • [36] Benchmarking framework for class imbalance problem using novel sampling approach for big data
    Khyati Ahlawat
    Anuradha Chug
    Amit Prakash Singh
    International Journal of System Assurance Engineering and Management, 2019, 10 : 824 - 835
  • [37] An Empirical Study on Data Sampling Methods in Addressing Class Imbalance Problem in Software Defect Prediction
    Odejide, Babajide J.
    Bajeh, Amos O.
    Balogun, Abdullateef O.
    Alanamu, Zubair O.
    Adewole, Kayode S.
    Akintola, Abimbola G.
    Salihu, Shakirat A.
    Usman-Hamza, Fatima E.
    Mojeed, Hammed A.
    SOFTWARE ENGINEERING PERSPECTIVES IN SYSTEMS, VOL. 1, 2022, 501 : 594 - 610
  • [38] Survey on deep learning with class imbalance
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [39] Addressing Class Imbalance Problem in Health Data Classification: Practical Application From an Oversampling Viewpoint
    Agyemang, Edmund Fosu
    Mensah, Joseph Agyapong
    Nyarko, Eric
    Arku, Dennis
    Mbeah-Baiden, Benedict
    Opoku, Enock
    Nortey, Ezekiel Nii Noye
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2025, 2025 (01)
  • [40] Survey on deep learning with class imbalance
    Justin M. Johnson
    Taghi M. Khoshgoftaar
    Journal of Big Data, 6