MapReduce Distributed Highly Random Fuzzy Forest for Noisy Big Data

被引:0
|
作者
Mustafic, Faruk [1 ]
Xiong, Ning [1 ]
Herera, Francisco [2 ]
Gallego, Sergio Ramrez [2 ]
机构
[1] Malardalen Univ, Hgsk Pl 1, S-72123 Vasteras, Sweden
[2] Univ Granada, Ave Hosp,S-N, Granada 18010, Spain
基金
瑞典研究理事会;
关键词
random forest; fuzzy decision tree; highly random fuzzy forest; noisy Big Data; attribute noise; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays the amounts of data available to us have the ever larger growth trend. On the other hand such data often contain noise. We call them noisy Big Data. There is an increasing need for learning methods that can handle such noisy Big Data for classification tasks. In this paper we propose a highly random fuzzy forest algorithm for learning an ensemble of fuzzy decision trees from a big data set contaminated with attribute noise. We also present the distributed version of the proposed learning algorithm implemented in the MapReduce framework. Experiment results have demonstrated that the proposed algorithm is faster and more accurate than the state-of-the-art approach particularly in the presence of attribute noise.
引用
收藏
页码:560 / 567
页数:8
相关论文
共 50 条
  • [1] On the use of MapReduce for imbalanced big data using Random Forest
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    INFORMATION SCIENCES, 2014, 285 : 112 - 137
  • [2] A Mapreduce Fuzzy Techniques of Big Data Classification
    El Bakry, Malak
    Safwat, Soha
    Hegazy, Osman
    PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI), 2016, : 118 - 128
  • [3] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [4] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [5] MapReduce based distributed improved random forest model for graduates career classification
    Qiao F.
    Ge Y.
    Kong W.
    1600, Systems Engineering Society of China (37): : 1383 - 1392
  • [6] Efficient Querying Distributed Big-XML Data using MapReduce
    Song Kunfang
    Hongwei Lu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2016, 8 (03) : 70 - 79
  • [7] A Survey on Geographically Distributed Big-Data Processing Using MapReduce
    Dolev, Shlomi
    Florissi, Patricia
    Gudes, Ehud
    Sharma, Shantanu
    Singer, Ido
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 60 - 80
  • [8] Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Ke, Shih-Wen
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 122 : 83 - 92
  • [9] Random forest algorithm in big data environment
    Liu, Yingchun
    Computer Modelling and New Technologies, 2014, 18 (12): : 147 - 151
  • [10] A Distributed Fuzzy Associative Classifier for Big Data
    Segatori, Armando
    Bechini, Alessio
    Ducange, Pietro
    Marcelloni, Francesco
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (09) : 2656 - 2669