MReC4.5: C4.5 ensemble classification with MapReduce

被引:17
|
作者
Wu, Gongqing [1 ]
Li, Haiguang [1 ]
Hu, Xuegang [1 ]
Bi, Yuanjun [1 ]
Zhang, Jing [1 ]
Wu, Xindong [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
关键词
Distributed computing; data mining; ensemble learning; classification; MapReduce;
D O I
10.1109/ChinaGrid.2009.39
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier "construct once, use anywhere".
引用
收藏
页码:249 / 255
页数:7
相关论文
共 50 条
  • [21] A Combination Classification Algorithm Based on Outlier Detection and C4.5
    Jiang, ShengYi
    Yu, Wen
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 504 - 511
  • [22] Facial beauty classification based on geometric features and C4.5
    Mao, Hui-Yun
    Jin, Lian-Wen
    Du, Ming-Hui
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (06): : 809 - 814
  • [23] Using C4.5 as variable selection criterion in classification tasks
    Martínez, J
    Fuentes, O
    [J]. PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2005, : 171 - 176
  • [24] Comparative Analysis of Data Customer Classification with C4.5 Algorithm
    Aisyah, Siti
    Rumapea, Bondang Johanes
    Halwan, M. Ghifari
    Siahaan, Denny Hartanto
    [J]. INTERNETWORKING INDONESIA, 2020, 12 (02): : 3 - 7
  • [25] Classification of Thrombosis Collagen Diseases based on C4.5 Algorithm
    Soliman, Sarah A.
    Abbas, Safia
    Salem, Abdel-Badeeh M.
    [J]. 2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 2015, : 131 - 136
  • [26] C4.5算法的优化
    黄秀霞
    孙力
    [J]. 计算机工程与设计, 2016, 37 (05) : 1265 - 1270
  • [27] The Application and Research of C4.5 Algorithm
    Zhao, Hongyan
    [J]. APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1285 - 1288
  • [28] Credal C4.5 with Refinement of Parameters
    Mantas, Carlos J.
    Abellan, Joaquin
    Castellano, Javier G.
    Cano, Jose R.
    Moral, Serafin
    [J]. INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: APPLICATIONS, IPMU 2018, PT III, 2018, 855 : 739 - 747
  • [29] AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
    Lee, Jong-Seok
    [J]. IEEE ACCESS, 2019, 7 : 106034 - 106042
  • [30] Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble
    Zhou, ZH
    Jiang, Y
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2003, 7 (01): : 37 - 42