A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization

被引:0
|
作者
Wang, Jiameng [1 ,2 ]
Yin, Yunfei [2 ]
Deng, Xiyu [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
classification; parallel data mining; segmentation point; Gini index; pruning ahead;
D O I
10.3103/S0146411620060097
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel optimization is one of the important research topics of data mining at this stage. Taking CART parallelization as an example, a parallel data mining algorithm based on segmentation and pruning optimization is proposed, namely SSP-OGini-PCCP optimization. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. The experimental results show that this method can significantly improve the efficiency of data classification and decision making, which meets the high demands of contemporary mass data processing.
引用
收藏
页码:483 / 492
页数:10
相关论文
共 50 条
  • [1] A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization
    Yunfei Jiameng Wang
    Xiyu Yin
    Automatic Control and Computer Sciences, 2020, 54 : 483 - 492
  • [2] Bayesian Optimization based on the data parallel approach
    Lv, Zhiming
    Zhao, Jun
    Wang, Wei
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 1671 - 1675
  • [3] Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
    Zhang, Huajie
    Song, Lei
    Zhang, Sen
    IAENG International Journal of Applied Mathematics, 2023, 53 (01):
  • [4] Performance Based Segmentation of Small and Medium Enterprises: A Data Mining Approach
    Hanif, Aamer
    Manarvi, Irfan Anjum
    CIE: 2009 INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2009, : 1509 - +
  • [5] An approach to parallel data mining for pharmacophore discovery
    Kamal, AH
    Graham, JH
    Page, CD
    INTELLIGENT SYSTEMS, 2001, : 100 - 103
  • [6] Tuning metaheuristics: A data mining based approach for particle swarm optimization
    Lessmann, Stefan
    Caserta, Marco
    Montalvo Arango, Idel
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 12826 - 12838
  • [7] Semantic Segmentation Optimization Algorithm Based on Knowledge Distillation and Model Pruning
    Yao, Weiwei
    Zhang, Jie
    Li, Chen
    Li, Shiyun
    He, Li
    Zhang, Bo
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 261 - 265
  • [8] Pruning Optimization in Frequent Itemset Mining Algorithm Based on Bit Combination
    Lu, Jun
    Zhou, Kailong
    Guo, Zhicong
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 115 - 116
  • [9] Parallel mining of uncertain data using segmentation of data set area and Voronoi diagrams
    Lukic, Ivica
    Hocenski, Zeljko
    Kohler, Mirko
    Galba, Tomislav
    AUTOMATIKA, 2018, 59 (3-4) : 349 - 356
  • [10] Pruning association rules in data mining
    Qin, Min
    Li, Zhi-Zhu
    2001, Shanghai Jiao Tong University (35):