The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers

被引：86

作者：

Zhai, Junhai ^{[1
,2
]}

Zhang, Sufang ^{[3
]}

Wang, Chenxi ^{[4
]}

机构：

[1] Hebei Univ, Coll Math & Informat Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China

[2] Zhejiang Normal Univ, Coll Math Phys & Informat Engn, Jinhua 321004, Peoples R China

[3] China Meteorol Adm, Hebei Branch Meteorol Cadres Training Inst, Baoding 071000, Peoples R China

[4] Hebei Univ, Coll Comp Sci & Technol, Baoding 071002, Hebei, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2017年 / 8卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Imbalanced large data sets; MapReduce; Extreme learning machine; Ensemble learning; Majority voting method; EXTREME LEARNING-MACHINE; BIG DATA; PERFORMANCE; TREE; UNCERTAINTY; REGRESSION; FUZZINESS; NETWORKS;

D O I：

10.1007/s13042-015-0478-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Aiming at effectively classifying imbalanced large data sets with two classes, this paper proposed a novel algorithm, which consists of four stages: (1) alternately over-sample p times between positive class instances and negative class instances; (2) construct l balanced data subsets based on the generated positive class instances; (3) train l component classifiers with extreme learning machine (ELM) algorithm on the constructed l balanced data subsets; (4) integrate the l ELM classifiers with simple voting approach. Specifically, in first stage, we firstly calculate the center of positive class instances, and then sample instance points along the line between the center and each positive class instance. Next, for each instance point in the new positive class, we firstly find its k nearest neighbors in negative class instances with MapRedcue, and then sample instance points along the line between the instance and its k nearest negative neighbors. The process of over-sampling is repeated p times. In the second stage, we sample instances l times from the negative class with the same size as the generated positive class instances. Each round of sampling, we put positive class and negative class instances together thus obtain l balanced data subsets. The experimental results show that the proposed algorithm can obtain promising speed-up and scalability, and also outperforms three other ensemble algorithms in G-mean.

引用

页码：1009 / 1017

页数：9

共 50 条

[41] A novel ensemble of classifiers for microarray data classification
Chen, Yuehui
Zhao, Yaou
[J]. APPLIED SOFT COMPUTING, 2008, 8 (04) : 1664 - 1669
[42] An ensemble of filters and classifiers for microarray data classification
Bolon-Canedo, V.
Sanchez-Marono, N.
Alonso-Betanzos, A.
[J]. PATTERN RECOGNITION, 2012, 45 (01) : 531 - 539
[43] Classification with local clustering in imbalanced data sets
Ji, Hua
Zhang, Huaxiang
[J]. ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
[44] HyperSurface classifiers ensemble for high dimensional data sets
Zhao, Xiu-Rong
He, Qing
Shi, Zhong-Zhi
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1299 - 1304
[45] Ensemble Data Classification based on Diversity of Classifiers Optimized by Genetic Algorithm
Thammasiri, Dech
Meesad, Phayung
[J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 6572 - +
[46] Biomedical Data Classification Using Supervised Classifiers and Ensemble Based Dictionaries
Tuysuzoglu, Goksu
Yaslan, Yusuf
[J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[47] Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers
Yu, Hualong
Hong, Shufang
Yang, Xibei
Ni, Jun
Dan, Yuanyuan
Qin, Bin
[J]. BIOMED RESEARCH INTERNATIONAL, 2013, 2013
[48] A Comprehensive Study on Ensemble-Based Imbalanced Data Classification Methods for Bankruptcy Data
UlagaPriya, K.
Pushpa, S.
[J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 800 - 804
[49] A MapReduce-Based ELM for Regression in Big Data
Wu, B.
Yan, T. H.
Xu, X. S.
He, B.
Li, W. H.
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
[50] Ensemble OS-ELM based on combination weight for data stream classification
Yu, Haiyang
Sun, Xiaoying
Wang, Jian
[J]. APPLIED INTELLIGENCE, 2019, 49 (06) : 2382 - 2390

← 1 2 3 4 5 →