A Parallel Algorithm to Induce Decision Trees for Large Datasets

被引：0

作者：

Franco-Arcega, A. ^{[1
]}

Suarez-Cansino, J. ^{[1
]}

Flores-Flores, L. G. ^{[1
]}

机构：

[1] Autonomous Univ State Hidalgo, Basic Sci & Engn Inst, Informat & Syst Technol Res Ctr, Mineral De La Reforma 42184, Hidalgo, Mexico

来源：

2013 XXIV INTERNATIONAL SYMPOSIUM ON INFORMATION, COMMUNICATION AND AUTOMATION TECHNOLOGIES (ICAT) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper introduces a new parallel algorithm called ParDTLT and discusses some of its advantages with respect to a set of well known sequential and parallel algorithms. The parallel process occurs in every node in the decision tree, which is constructed during the supervised training phase. The basis of the distribution of a parallel task is on the attributes of the training objects and the growing of the tree is based on two criteria, who are defined by the maximum number of training objects that every node can support and an entropic gain ratio criterion. Different experiments have been made to compare the behavior of the parallel algorithm ParDTLT with the behavior of the sequential algorithms C4.5, VFDT, YaDT and DTLT and with the parallel algorithm called Synchronous. The experimental results show that ParDTLT keeps the quality of classification and it reduces the execution time.

引用

页数：6

共 50 条

[31] Large Scale Prediction with Decision Trees
Klusowski, Jason M.
Tian, Peter M.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 525 - 537
[32] FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets
Zhang, Shaoqiang
Liang, Ying
Wang, Xiangyun
Su, Zhengchang
Chen, Yong
[J]. DNA RESEARCH, 2019, 26 (03) : 231 - 242
[33] Data Structures for Parallel Spatial Algorithms on Large Datasets
Franklin, W. Randolph
Gomes de Magalhaes, Salles Viana
Alvim Andrade, Marcus Vinicius
[J]. BIGSPATIAL 2018: PROCEEDINGS OF THE 7TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2018), 2018, : 16 - 19
[34] MEASURING, TESTING, AND IDENTIFYING HETEROGENEITY OF LARGE PARALLEL DATASETS
Peng, Liuhua
Wang, Guanghui
Zou, Changliang
[J]. STATISTICA SINICA, 2023, 33 (04) : 2787 - 2808
[35] Parallel Rule Discovery from Large Datasets by Sampling
Fan, Wenfei
Han, Ziyan
Wang, Yaoshu
Xie, Min
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 384 - 398
[36] A Scalable Classification Algorithm for Very Large Datasets
Delen, Dursun
Kletke, Marilyn
Kim, Jin-Hwa
[J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2005, 4 (02) : 83 - 94
[37] MPRK Algorithm for Clustering the Large Text Datasets
Thangarasu, M.
Inbarani, H. Hannah
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 224 - 229
[38] eTURF: A competitive TURF algorithm for large datasets
Ennis, John M.
Fayle, Charles M.
Ennis, Daniel M.
[J]. FOOD QUALITY AND PREFERENCE, 2012, 23 (01) : 44 - 48
[39] A Parallel Algorithm for Building iCPI-trees
Andrzejewski, Witold
Boinski, Pawel
[J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2014), 2014, 8716 : 276 - 289
[40] A parallel algorithm for building iCPI-trees*
Andrzejewski, Witold
Boinski, Pawel
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8716 : 276 - 289

← 1 2 3 4 5 →