A Parallel Data Mining Approach Based on Segmentation and Pruning Optimization

被引:0
|
作者
Wang, Jiameng [1 ,2 ]
Yin, Yunfei [2 ]
Deng, Xiyu [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China
基金
中国国家自然科学基金;
关键词
classification; parallel data mining; segmentation point; Gini index; pruning ahead;
D O I
10.3103/S0146411620060097
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel optimization is one of the important research topics of data mining at this stage. Taking CART parallelization as an example, a parallel data mining algorithm based on segmentation and pruning optimization is proposed, namely SSP-OGini-PCCP optimization. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. The experimental results show that this method can significantly improve the efficiency of data classification and decision making, which meets the high demands of contemporary mass data processing.
引用
收藏
页码:483 / 492
页数:10
相关论文
共 50 条
  • [21] Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining
    Li, Xiao-Bai
    Sarkar, Sumit
    OPERATIONS RESEARCH, 2009, 57 (06) : 1496 - 1509
  • [22] ParallAX — A data mining tool based on parallel coordinates
    Tova Avidan
    Shlomo Avidan
    Computational Statistics, 1999, 14 : 79 - 89
  • [23] ParallAX - A data mining tool based on parallel coordinates
    Avidan, T
    Avidan, S
    COMPUTATIONAL STATISTICS, 1999, 14 (01) : 79 - 89
  • [24] A Parallel Data Mining Method based on Complex Network
    He Yan-li
    OPTICAL, ELECTRONIC MATERIALS AND APPLICATIONS, PTS 1-2, 2011, 216 : 752 - 756
  • [25] A parallel approach for high utility-based frequent pattern mining in a big data environment
    Krishna Kumar Mohbey
    Sunil Kumar
    Iran Journal of Computer Science, 2021, 4 (3) : 195 - 200
  • [26] Optimization Technique Based Approach for Image Segmentation
    Poojary, Manjula
    Srinivas, Yarramalle
    CURRENT MEDICAL IMAGING, 2023, 19 (10) : 1167 - 1177
  • [27] An Optimization-based Approach to Key Segmentation
    Chuan, Ching-Hua
    Chew, Elaine
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 603 - 608
  • [28] Multilevel thresholding for image segmentation based on parallel distributed optimization
    Sandeli, Mohamed
    Batouche, Mohamed
    2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 134 - 139
  • [29] A family of optimization based data mining methods
    Shi, Yong
    Liu, Rong
    Yan, Nian
    Chen, Zhenxing
    PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 26 - +
  • [30] The Method of Parallel Optimization and Parallel Recognition Based on Data Dependence
    Yan, Zhao
    Liu, Lei
    Ma, Li
    SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 120 - +