Fuzzy integral-based ELM ensemble for imbalanced big data classification

被引:38
|
作者
Zhai, Junhai [1 ]
Zhang, Sufang [2 ]
Zhang, Mingyang [1 ]
Liu, Xiaomeng [1 ]
机构
[1] Hebei Univ, Coll Math & Informat Sci, Key Lab Machine Learning & Computat Intelligence, Baoding 071002, Hebei, Peoples R China
[2] China Meteorol Adm, Training Ctr, Hebei Branch, Baoding 071000, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced big data; MapReduce; Non-iterative learning; Oversampling; Fuzzy integral; EXTREME LEARNING-MACHINE; DATA-SETS; MAPREDUCE; APPROXIMATION; UNCERTAINTY; REDUCTION; NETWORKS; SYSTEMS; MODEL;
D O I
10.1007/s00500-018-3085-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data are data too big to be handled and analyzed by traditional software tools, big data can be characterized by five V's features: volume, velocity, variety, value and veracity. However, in the real world, some big data have another feature, i.e., class imbalanced, such as e-health big data, credit card fraud detection big data and extreme weather forecast big data are all class imbalanced. In order to deal with the problem of classifying binary imbalanced big data, based on MapReduce, non-iterative learning, ensemble learning and oversampling, this paper proposed an promising algorithm which includes three stages. Firstly, for each positive instance, its enemy nearest neighbor is found with MapReduce, and p positive instances are randomly generated with uniform distribution in its enemy nearest neighbor hypersphere, i.e., oversampling p positive instances within the hypersphere. Secondly, l balanced data subsets are constructed and l classifiers are trained on the constructed data subsets with an non-iterative learning approach. Finally, the trained classifiers are integrated by fuzzy integral to classify unseen instances. We experimentally compared the proposed algorithm with three related algorithms: SMOTE, SMOTE+RF-BigData and MR-V-ELM, and conducted a statistical analysis on the experimental results. The experimental results and the statistical analysis demonstrate that the proposed algorithm outperforms the other three methods.
引用
收藏
页码:3519 / 3531
页数:13
相关论文
共 50 条
  • [1] Fuzzy integral-based ELM ensemble for imbalanced big data classification
    Junhai Zhai
    Sufang Zhang
    Mingyang Zhang
    Xiaomeng Liu
    [J]. Soft Computing, 2018, 22 : 3519 - 3531
  • [2] Fuzzy Integral-Based Multi-Classifiers Ensemble for Android Malware Classification
    Taha, Altyeb
    Barukab, Omar
    Malebary, Sharaf
    [J]. MATHEMATICS, 2021, 9 (22)
  • [3] The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers
    Zhai, Junhai
    Zhang, Sufang
    Wang, Chenxi
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (03) : 1009 - 1017
  • [4] The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers
    Junhai Zhai
    Sufang Zhang
    Chenxi Wang
    [J]. International Journal of Machine Learning and Cybernetics, 2017, 8 : 1009 - 1017
  • [5] Ensemble RBM-based classifier using fuzzy integral for big data classification
    Junhai Zhai
    Xu Zhou
    Sufang Zhang
    Tingting Wang
    [J]. International Journal of Machine Learning and Cybernetics, 2019, 10 : 3327 - 3337
  • [6] Ensemble RBM-based classifier using fuzzy integral for big data classification
    Zhai, Junhai
    Zhou, Xu
    Zhang, Sufang
    Wang, Tingting
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3327 - 3337
  • [7] Fuzzy Integral-based Neural Network Ensemble for Facial Expression Recognition
    Wang, Z. Y.
    Xiao, N. F.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 709 - 712
  • [8] Binary imbalanced big data classification based on fuzzy data reduction and classifier fusion
    Zhai, Junhai
    Wang, Mohan
    Zhang, Sufang
    [J]. SOFT COMPUTING, 2022, 26 (06) : 2781 - 2792
  • [9] Binary imbalanced big data classification based on fuzzy data reduction and classifier fusion
    Junhai Zhai
    Mohan Wang
    Sufang Zhang
    [J]. Soft Computing, 2022, 26 : 2781 - 2792
  • [10] Constructing Support Vector Machines Ensemble Classification Method for Imbalanced Datasets Based on Fuzzy Integral
    Chen, Pu
    Zhang, Dayong
    [J]. MODERN ADVANCES IN APPLIED INTELLIGENCE, IEA/AIE 2014, PT I, 2014, 8481 : 70 - 76