The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers

被引:86
|
作者
Zhai, Junhai [1 ,2 ]
Zhang, Sufang [3 ]
Wang, Chenxi [4 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China
[2] Zhejiang Normal Univ, Coll Math Phys & Informat Engn, Jinhua 321004, Peoples R China
[3] China Meteorol Adm, Hebei Branch Meteorol Cadres Training Inst, Baoding 071000, Peoples R China
[4] Hebei Univ, Coll Comp Sci & Technol, Baoding 071002, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced large data sets; MapReduce; Extreme learning machine; Ensemble learning; Majority voting method; EXTREME LEARNING-MACHINE; BIG DATA; PERFORMANCE; TREE; UNCERTAINTY; REGRESSION; FUZZINESS; NETWORKS;
D O I
10.1007/s13042-015-0478-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Aiming at effectively classifying imbalanced large data sets with two classes, this paper proposed a novel algorithm, which consists of four stages: (1) alternately over-sample p times between positive class instances and negative class instances; (2) construct l balanced data subsets based on the generated positive class instances; (3) train l component classifiers with extreme learning machine (ELM) algorithm on the constructed l balanced data subsets; (4) integrate the l ELM classifiers with simple voting approach. Specifically, in first stage, we firstly calculate the center of positive class instances, and then sample instance points along the line between the center and each positive class instance. Next, for each instance point in the new positive class, we firstly find its k nearest neighbors in negative class instances with MapRedcue, and then sample instance points along the line between the instance and its k nearest negative neighbors. The process of over-sampling is repeated p times. In the second stage, we sample instances l times from the negative class with the same size as the generated positive class instances. Each round of sampling, we put positive class and negative class instances together thus obtain l balanced data subsets. The experimental results show that the proposed algorithm can obtain promising speed-up and scalability, and also outperforms three other ensemble algorithms in G-mean.
引用
收藏
页码:1009 / 1017
页数:9
相关论文
共 50 条
  • [41] A novel ensemble of classifiers for microarray data classification
    Chen, Yuehui
    Zhao, Yaou
    [J]. APPLIED SOFT COMPUTING, 2008, 8 (04) : 1664 - 1669
  • [42] An ensemble of filters and classifiers for microarray data classification
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    [J]. PATTERN RECOGNITION, 2012, 45 (01) : 531 - 539
  • [43] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    [J]. ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
  • [44] HyperSurface classifiers ensemble for high dimensional data sets
    Zhao, Xiu-Rong
    He, Qing
    Shi, Zhong-Zhi
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1299 - 1304
  • [45] Ensemble Data Classification based on Diversity of Classifiers Optimized by Genetic Algorithm
    Thammasiri, Dech
    Meesad, Phayung
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 6572 - +
  • [46] Biomedical Data Classification Using Supervised Classifiers and Ensemble Based Dictionaries
    Tuysuzoglu, Goksu
    Yaslan, Yusuf
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [47] Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers
    Yu, Hualong
    Hong, Shufang
    Yang, Xibei
    Ni, Jun
    Dan, Yuanyuan
    Qin, Bin
    [J]. BIOMED RESEARCH INTERNATIONAL, 2013, 2013
  • [48] A Comprehensive Study on Ensemble-Based Imbalanced Data Classification Methods for Bankruptcy Data
    UlagaPriya, K.
    Pushpa, S.
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 800 - 804
  • [49] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [50] Ensemble OS-ELM based on combination weight for data stream classification
    Yu, Haiyang
    Sun, Xiaoying
    Wang, Jian
    [J]. APPLIED INTELLIGENCE, 2019, 49 (06) : 2382 - 2390