Parallel formulations of decision-tree classification algorithms

被引:70
|
作者
Srivastava, A
Han, EH
Kumar, V
Singh, V
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Army HPC Res Ctr, Minneapolis, MN 55455 USA
[2] Hitachi Amer Inc, Informat Technol Lab, Tarrytown, NY 10591 USA
基金
美国国家科学基金会;
关键词
data mining; parallel processing; classification; scalability; decision trees;
D O I
10.1023/A:1009832825273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classification decision trees have a natural concurrency, but are difficult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classification decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.
引用
收藏
页码:237 / 261
页数:25
相关论文
共 50 条
  • [1] Parallel formulations of decision-tree classification algorithms
    Srivastava, A
    Han, EH
    Kumar, V
    Singh, V
    [J]. 1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS, 1998, : 237 - 244
  • [2] Parallel Formulations of Decision-Tree Classification Algorithms
    Anurag Srivastava
    Eui-Hong Han
    Vipin Kumar
    Vineet Singh
    [J]. Data Mining and Knowledge Discovery, 1999, 3 : 237 - 261
  • [3] Extensions to decision-tree based packet classification algorithms to address new classification paradigms
    Stimpfling, Thibaut
    Belanger, Normand
    Cherkaoui, Omar
    Beliveau, Andre
    Beliveau, Ludovic
    Savaria, Yvon
    [J]. COMPUTER NETWORKS, 2017, 122 : 83 - 95
  • [4] Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms
    Barros, Rodrigo C.
    Basgalupp, Marcio P.
    de Carvalho, Andre C. P. L. F.
    Freitas, Alex A.
    [J]. EVOLUTIONARY COMPUTATION, 2013, 21 (04) : 659 - 684
  • [5] A Survey of Evolutionary Algorithms for Decision-Tree Induction
    Barros, Rodrigo Coelho
    Basgalupp, Marcio Porto
    de Carvalho, Andre C. P. L. F.
    Freitas, Alex A.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (03): : 291 - 312
  • [6] Estimation of Distribution Algorithms for Decision-Tree Induction
    Cagnini, Henry E. L.
    Barros, Rodrigo C.
    Basgalupp, Marcio P.
    [J]. 2017 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2017, : 2022 - 2029
  • [7] An Algorithm of Decision-tree Generating Automatically Based on Classification
    Hu, Lihong
    Yu, Zifan
    Liu, Yanfang
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL I, 2009, : 823 - +
  • [8] INTERPRETABLE DECISION-TREE INDUCTION IN A BIG DATA PARALLEL FRAMEWORK
    Weinberg, Abraham Itzhak
    Last, Mark
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 27 (04) : 737 - 748
  • [9] A novel decision-tree based classification of white blood cells
    Xuan, X
    Liao, QM
    Jiang, K
    [J]. MEDICAL IMAGING 2005: IMAGE PROCESSING, PT 1-3, 2005, 5747 : 1120 - 1127
  • [10] Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification
    Weinberg, Abraham Itzhak
    Last, Mark
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)