MReC4.5: C4.5 ensemble classification with MapReduce

被引:17
|
作者
Wu, Gongqing [1 ]
Li, Haiguang [1 ]
Hu, Xuegang [1 ]
Bi, Yuanjun [1 ]
Zhang, Jing [1 ]
Wu, Xindong [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
关键词
Distributed computing; data mining; ensemble learning; classification; MapReduce;
D O I
10.1109/ChinaGrid.2009.39
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier "construct once, use anywhere".
引用
收藏
页码:249 / 255
页数:7
相关论文
共 50 条
  • [1] NeC4.5: Neural ensemble based C4.5
    Zhou, ZH
    Jiang, YA
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (06) : 770 - 773
  • [2] A mapreduce implementation of C4.5 decision tree algorithm
    [J]. Ji, W. (jiweiit@163.com), 1600, Science and Engineering Research Support Society (07):
  • [3] A parallel C4.5 decision tree algorithm based on MapReduce
    Mu, Yashuang
    Liu, Xiaodong
    Yang, Zhihao
    Liu, Xiaolin
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (08):
  • [4] Research on C4.5 algorithm improvement strategy based on MapReduce
    Wang, Huan-Bin
    Gao, Yang-Jun
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 160 - 165
  • [5] A Comparative Analysis of Pruning Methods for C4.5 and Fuzzy C4.5
    Naseer, Tayyeba
    Asghar, Sohail
    Zhuang, Yan
    Fong, Simon
    [J]. ADVANCES IN DIGITAL TECHNOLOGIES, 2015, 275 : 304 - 312
  • [6] Diabetic Retinopathy Classification Using C4.5
    Park, Mira
    Summons, Peter
    [J]. KNOWLEDGE MANAGEMENT AND ACQUISITION FOR INTELLIGENT SYSTEMS (PKAW 2018), 2018, 11016 : 90 - 101
  • [7] A C4.5 algorithm for english emotional classification
    Phu Vo Ngoc
    Chau Vo Thi Ngoc
    Tran Vo Thi Ngoc
    Dat Nguyen Duy
    [J]. EVOLVING SYSTEMS, 2019, 10 (03) : 425 - 451
  • [8] A C4.5 algorithm for english emotional classification
    Phu Vo Ngoc
    Chau Vo Thi Ngoc
    Tran Vo Thi Ngoc
    Dat Nguyen Duy
    [J]. Evolving Systems, 2019, 10 : 425 - 451
  • [9] Fast C4.5
    He, Ping
    Chen, Ling
    Xu, Xiao-Hua
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 2841 - +
  • [10] Efficient C4.5
    Ruggieri, S
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (02) : 438 - 444