A Parallel Algorithm to Induce Decision Trees for Large Datasets

被引:0
|
作者
Franco-Arcega, A. [1 ]
Suarez-Cansino, J. [1 ]
Flores-Flores, L. G. [1 ]
机构
[1] Autonomous Univ State Hidalgo, Basic Sci & Engn Inst, Informat & Syst Technol Res Ctr, Mineral De La Reforma 42184, Hidalgo, Mexico
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a new parallel algorithm called ParDTLT and discusses some of its advantages with respect to a set of well known sequential and parallel algorithms. The parallel process occurs in every node in the decision tree, which is constructed during the supervised training phase. The basis of the distribution of a parallel task is on the attributes of the training objects and the growing of the tree is based on two criteria, who are defined by the maximum number of training objects that every node can support and an entropic gain ratio criterion. Different experiments have been made to compare the behavior of the parallel algorithm ParDTLT with the behavior of the sequential algorithms C4.5, VFDT, YaDT and DTLT and with the parallel algorithm called Synchronous. The experimental results show that ParDTLT keeps the quality of classification and it reduces the execution time.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Large Scale Prediction with Decision Trees
    Klusowski, Jason M.
    Tian, Peter M.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 525 - 537
  • [32] FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets
    Zhang, Shaoqiang
    Liang, Ying
    Wang, Xiangyun
    Su, Zhengchang
    Chen, Yong
    [J]. DNA RESEARCH, 2019, 26 (03) : 231 - 242
  • [33] Data Structures for Parallel Spatial Algorithms on Large Datasets
    Franklin, W. Randolph
    Gomes de Magalhaes, Salles Viana
    Alvim Andrade, Marcus Vinicius
    [J]. BIGSPATIAL 2018: PROCEEDINGS OF THE 7TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2018), 2018, : 16 - 19
  • [34] MEASURING, TESTING, AND IDENTIFYING HETEROGENEITY OF LARGE PARALLEL DATASETS
    Peng, Liuhua
    Wang, Guanghui
    Zou, Changliang
    [J]. STATISTICA SINICA, 2023, 33 (04) : 2787 - 2808
  • [35] Parallel Rule Discovery from Large Datasets by Sampling
    Fan, Wenfei
    Han, Ziyan
    Wang, Yaoshu
    Xie, Min
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 384 - 398
  • [36] A Scalable Classification Algorithm for Very Large Datasets
    Delen, Dursun
    Kletke, Marilyn
    Kim, Jin-Hwa
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2005, 4 (02) : 83 - 94
  • [37] MPRK Algorithm for Clustering the Large Text Datasets
    Thangarasu, M.
    Inbarani, H. Hannah
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 224 - 229
  • [38] eTURF: A competitive TURF algorithm for large datasets
    Ennis, John M.
    Fayle, Charles M.
    Ennis, Daniel M.
    [J]. FOOD QUALITY AND PREFERENCE, 2012, 23 (01) : 44 - 48
  • [39] A Parallel Algorithm for Building iCPI-trees
    Andrzejewski, Witold
    Boinski, Pawel
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2014), 2014, 8716 : 276 - 289
  • [40] A parallel algorithm for building iCPI-trees*
    Andrzejewski, Witold
    Boinski, Pawel
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8716 : 276 - 289