PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

被引:10
|
作者
Mao, Yimin [1 ]
Geng, Junhao [1 ]
Mwakapesa, Deborah Simon [1 ]
Nanehkaran, Yaser Ahangari [1 ]
Chi, Zhang [1 ]
Deng, Xiaoheng [2 ]
Chen, Zhigang [2 ]
机构
[1] Jiangxi Univ Sci & Technol, Sch Informat Engn, Ganzhou 341000, Jiangxi, Peoples R China
[2] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
DiffNodeset structure; MapReduce; 2-Way comparison strategy; Load balancing strategy based on dynamic grouping; Frequent item mining;
D O I
10.1007/s00530-020-00725-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as PFIMD algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called DiffNodeset is adopted for avoiding the increase of N-list cardinality in the MRPrePost algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the DiffNodeset generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in F-list, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of MRPrePost in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of PFIMD algorithm in several multimedia data sets are listed to illustrate its universality.
引用
收藏
页码:709 / 722
页数:14
相关论文
共 50 条
  • [11] IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model
    Liu, Sheng-Hui
    Liu, Shi-Jia
    Chen, Shi-Xuan
    Yu, Kun-Ming
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1290 - 1295
  • [12] YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark
    Qiu, Hongjian
    Gu, Rong
    Yuan, Chunfeng
    Huang, Yihua
    [J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1664 - 1671
  • [13] A New Parallel Algorithm for the Frequent Itemset Mining Problem
    Craus, Mitica
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, 2008, : 165 - 170
  • [14] Parallel Processing of Frequent Itemset Based on MapReduce Programming Model
    Deshmukh, Rajshree A.
    Bharathi, H. N.
    Tripathy, Amiya K.
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [15] A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce
    Sivaiah, Borra
    Rao, Ramisetty Rajeswara
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (09) : 1051 - 1058
  • [16] A novel parallel frequent itemset mining algorithm for automatic enterprise
    Mao, Yimin
    Wu, Bin
    Deng, Qianhu
    Mahmoodi, Soroosh
    Chen, Zhigang
    Chen, Yeh-Cheng
    [J]. ENTERPRISE INFORMATION SYSTEMS, 2023, 17 (10)
  • [17] A Novel Parallel Algorithm for Frequent Itemset Mining of Incremental Dataset
    Xu, Lijun
    Zhang, Yun
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 41 - 44
  • [18] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [19] SmartCache: An Optimized MapReduce Implementation of Frequent Itemset Mining
    Huang, Dachuan
    Song, Yang
    Routray, Ramani
    Qin, Feng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 16 - 25
  • [20] MapReduce-based Parallelized Approximation of Frequent Itemsets Mining in Uncertain Data
    Xu, Jing
    Mao, Xiao-Jiao
    Lu, Wen-Yang
    Zhu, Qi-Hai
    Li, Ning
    Yang, Yu-Bin
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 2015, 9492 : 136 - 144