A Parallel Algorithm to Induce Decision Trees for Large Datasets

被引:0
|
作者
Franco-Arcega, A. [1 ]
Suarez-Cansino, J. [1 ]
Flores-Flores, L. G. [1 ]
机构
[1] Autonomous Univ State Hidalgo, Basic Sci & Engn Inst, Informat & Syst Technol Res Ctr, Mineral De La Reforma 42184, Hidalgo, Mexico
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a new parallel algorithm called ParDTLT and discusses some of its advantages with respect to a set of well known sequential and parallel algorithms. The parallel process occurs in every node in the decision tree, which is constructed during the supervised training phase. The basis of the distribution of a parallel task is on the attributes of the training objects and the growing of the tree is based on two criteria, who are defined by the maximum number of training objects that every node can support and an entropic gain ratio criterion. Different experiments have been made to compare the behavior of the parallel algorithm ParDTLT with the behavior of the sequential algorithms C4.5, VFDT, YaDT and DTLT and with the parallel algorithm called Synchronous. The experimental results show that ParDTLT keeps the quality of classification and it reduces the execution time.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets
    Franco-Arcega, Anilu
    Ariel Carrasco-Ochoa, J.
    Sanchez-Diaz, Guillermo
    Martinez-Trinidad, J. Fco
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2008, 2008, 5326 : 282 - +
  • [2] Parallel Traversal of Large Ensembles of Decision Trees
    Lettich, Francesco
    Lucchese, Claudio
    Nardini, Franco Maria
    Orlando, Salvatore
    Perego, Raffaele
    Tonellotto, Nicola
    Venturini, Rossano
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (09) : 2075 - 2089
  • [3] POFCM: A Parallel Fuzzy Clustering Algorithm for Large Datasets
    Perez-Ortega, Joaquin
    Rey-Figueroa, Cesar David
    Roblero-Aguilar, Sandra Silvia
    Almanza-Ortega, Nelva Nely
    Zavala-Diaz, Crispin
    Garcia-Paredes, Salomon
    Landero-Najera, Vanesa
    [J]. MATHEMATICS, 2023, 11 (08)
  • [4] A parallel Kohonen algorithm for the classification of large spatial datasets
    Openshaw, S
    Turton, I
    [J]. COMPUTERS & GEOSCIENCES, 1996, 22 (09) : 1019 - 1026
  • [5] Parallel Algorithm of Local Support Vector Regression for Large Datasets
    Le-Diem Bui
    Minh-Thu Tran-Nguyen
    Kim, Yong-Gi
    Thanh-Nghi Do
    [J]. FUTURE DATA AND SECURITY ENGINEERING, 2017, 10646 : 139 - 153
  • [6] A parallel decision tree builder for mining very large visualization datasets
    Bowyer, KW
    Hall, LO
    Moore, T
    Chawla, N
    [J]. SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 1888 - 1893
  • [7] Decision trees using local support vector regression models for large datasets
    Tran-Nguyen, Minh-Thu
    Bui, Le-Diem
    Do, Thanh-Nghi
    [J]. JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2020, 4 (01) : 17 - 35
  • [8] Multivariate Decision Trees Using Different Splitting Attribute Subsets for Large Datasets
    Franco-Arcega, Anilu
    Ariel Carrasco-Ochoa, Jose
    Sanchez-Diaz, Guillermo
    Fco Martinez-Trinidad, Jose
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2010, 6085 : 370 - +
  • [9] Improving Parallel Modularization Algorithm of Large Complex Fault Trees
    Li, Zhifeng
    Ren, Yi
    Liu, Linlin
    Wang, Zili
    [J]. ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM 2016 PROCEEDINGS, 2016,
  • [10] Parallel WaveCluster: A linear scaling parallel clustering algorithm implementation with application to very large datasets
    Yildirim, Ahmet Artu
    Ozdogan, Cem
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (07) : 955 - 962