The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers

被引:86
|
作者
Zhai, Junhai [1 ,2 ]
Zhang, Sufang [3 ]
Wang, Chenxi [4 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China
[2] Zhejiang Normal Univ, Coll Math Phys & Informat Engn, Jinhua 321004, Peoples R China
[3] China Meteorol Adm, Hebei Branch Meteorol Cadres Training Inst, Baoding 071000, Peoples R China
[4] Hebei Univ, Coll Comp Sci & Technol, Baoding 071002, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced large data sets; MapReduce; Extreme learning machine; Ensemble learning; Majority voting method; EXTREME LEARNING-MACHINE; BIG DATA; PERFORMANCE; TREE; UNCERTAINTY; REGRESSION; FUZZINESS; NETWORKS;
D O I
10.1007/s13042-015-0478-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aiming at effectively classifying imbalanced large data sets with two classes, this paper proposed a novel algorithm, which consists of four stages: (1) alternately over-sample p times between positive class instances and negative class instances; (2) construct l balanced data subsets based on the generated positive class instances; (3) train l component classifiers with extreme learning machine (ELM) algorithm on the constructed l balanced data subsets; (4) integrate the l ELM classifiers with simple voting approach. Specifically, in first stage, we firstly calculate the center of positive class instances, and then sample instance points along the line between the center and each positive class instance. Next, for each instance point in the new positive class, we firstly find its k nearest neighbors in negative class instances with MapRedcue, and then sample instance points along the line between the instance and its k nearest negative neighbors. The process of over-sampling is repeated p times. In the second stage, we sample instances l times from the negative class with the same size as the generated positive class instances. Each round of sampling, we put positive class and negative class instances together thus obtain l balanced data subsets. The experimental results show that the proposed algorithm can obtain promising speed-up and scalability, and also outperforms three other ensemble algorithms in G-mean.
引用
收藏
页码:1009 / 1017
页数:9
相关论文
共 50 条
  • [1] The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers
    Junhai Zhai
    Sufang Zhang
    Chenxi Wang
    [J]. International Journal of Machine Learning and Cybernetics, 2017, 8 : 1009 - 1017
  • [2] Adaptive ensemble of classifiers with regularization for imbalanced data classification
    Wang, Chen
    Deng, Chengyuan
    Yu, Zhoulu
    Hui, Dafeng
    Gong, Xiaofeng
    Luo, Ruisen
    [J]. INFORMATION FUSION, 2021, 69 : 81 - 102
  • [3] Fuzzy integral-based ELM ensemble for imbalanced big data classification
    Zhai, Junhai
    Zhang, Sufang
    Zhang, Mingyang
    Liu, Xiaomeng
    [J]. SOFT COMPUTING, 2018, 22 (11) : 3519 - 3531
  • [4] Fuzzy integral-based ELM ensemble for imbalanced big data classification
    Junhai Zhai
    Sufang Zhang
    Mingyang Zhang
    Xiaomeng Liu
    [J]. Soft Computing, 2018, 22 : 3519 - 3531
  • [5] Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
    Kadkhodaei, Hamidreza
    Moghadam, Amir Masoud Eftekhari
    Dehghan, Mehdi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [6] Ensemble of Classifiers Based on Multiobjective Genetic Sampling for Imbalanced Data
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    Yao, Xin
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (06) : 1104 - 1115
  • [7] Comparing the Classification Performances of Supervised Classifiers with Balanced and Imbalanced SAR Data Sets
    Ustuner, Mustafa
    Gokdag, Unsal
    Bilgin, Gokhan
    Sanli, Fusun Balik
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [8] ELM-Based Imbalanced Data Classification-A Review
    Rajput, Brajendra Singh
    Roy, Partha
    Soni, Sunita
    Raghuwanshi, Bhagat Singh
    [J]. Informatica (Slovenia), 2024, 48 (02): : 185 - 204
  • [9] Balanced Neighborhood Classifiers for Imbalanced Data Sets
    Zhu, Shunzhi
    Ma, Ying
    Pan, Weiwei
    Zhu, Xiatian
    Luo, Guangchun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (12): : 3226 - 3229
  • [10] Equalization ensemble for large scale highly imbalanced data classification
    Ren, Jinjun
    Wang, Yuping
    Mao, Mingqian
    Cheung, Yiu-ming
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 242