BENN: Balanced Ensemble Neural Network for Handling Class Imbalance in Big Data

被引:0
|
作者
Ramesh, Sneha Halebeedu [1 ,2 ]
Basava, Annappa [1 ]
Perumal, Sankar Pariserum [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, Surathkal, India
[2] Nitte Meenakshi Inst Technol, Dept Informat Sci & Engn, Bengaluru, India
关键词
concept drift; decision tree regression; decision trees; machine learning; national health dataset; random forest;
D O I
10.1111/exsy.13754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is a critical challenge in big data analytics, often leading to biased predictive models. This imbalance can lead to biased models that perform well on the majority class but poorly on the minority class. Many machine learning models tend to be biased towards the majority class because they aim to minimise overall error, often leading to poor performance on the minority class. This paper presents the balanced ensemble neural network, a novel solution to effectively address class imbalance in big data. Balanced ensemble neural network combines the robust capabilities of neural networks with the power of ensemble learning, incorporating class balancing strategies to ensure fair representation of minority classes. The methodology involves integrating multiple neural networks, each trained on balanced subsets of data using techniques like Synthetic Minority Over-sampling Technique and Random Undersampling. This integration aims to leverage the strengths of individual networks while reducing their inherent biases. Our extensive experiments across various datasets reveal that BENN achieves an AUC-ROC score of 0.94, surpassing other models such as random forest (0.88), support vector (0.84) and single neural net (0.80). It was also observed that BENN's performance is better compared to traditional neural network models and standard ensemble methods in key metrics like accuracy, precision, recall, F1-score and AUC-ROC. The results specifically highlight BENN's effectiveness in accurately classifying instances of minority classes, a notable challenge in many existing models. These findings underscore BENN's potential as a substantial advancement in handling class imbalance within big data environments, offering a promising direction for future research and application in machine learning.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A Comparative Study on Sampling Techniques for Handling Class Imbalance in Streaming Data
    Nguyen, Hien M.
    Cooper, Eric W.
    Kamei, Katsuari
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 1762 - 1767
  • [22] Balanced Image Data Based Ensemble of Convolutional Neural Networks
    Jan, Zohaib Md.
    Verma, Brijesh
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2418 - 2424
  • [23] Class Imbalance in Network Traffic Classification: An Adaptive Weight Ensemble-of-Ensemble Learning Method
    Abbasi, Mahmoud
    Florez, Sebastian Lopez
    Shahraki, Amin
    Taherkordi, Amir
    Prieto, Javier
    Corchado, Juan M.
    IEEE ACCESS, 2025, 13 : 26171 - 26192
  • [24] Ensemble framework for concept drift detection and class imbalance in data streams
    S P.
    R A.U.
    Multimedia Tools and Applications, 2025, 84 (11) : 8823 - 8837
  • [25] Big data analysis for gas sensor using convolutional neural network and ensemble of evolutionary algorithms
    Essiet, Ima
    Sun, Yanxia
    Wang, Zenghui
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE MATERIALS PROCESSING AND MANUFACTURING (SMPM 2019), 2019, 35 : 629 - 634
  • [26] Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques
    Hu, Ya-Han
    Liu, Ting-Hsuan
    Tsai, Chih-Fong
    Lin, Yu-Jung
    APPLIED ARTIFICIAL INTELLIGENCE, 2025, 39 (01)
  • [27] Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem
    Rendon, Erendira
    Alejo, Roberto
    Castorena, Carlos
    Isidro-Ortega, Frank J.
    Granda-Gutierrez, Everardo E.
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [28] Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms
    Hussin, Sahar K.
    Abdelmageid, Salah M.
    Alkhalil, Adel
    Omar, Yasser M.
    Marie, Mahmoud, I
    Ramadan, Rabie A.
    COMPLEXITY, 2021, 2021
  • [29] On dynamic ensemble selection and data preprocessing for multi-class imbalance learning
    Cruz, Rafael M. O.
    Sabourin, Robert
    Cavalcanti, George D. C.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 189 - 194
  • [30] Dynamic Ensemble Selection and Data Preprocessing for Multi-Class Imbalance Learning
    Cruz, Rafael M. O.
    Souza, Mariana de Araujo
    Sabourin, Robert
    Cavalcanti, George D. C.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (11)