PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

被引：10

作者：

Mao, Yimin ^{[1
]}

Geng, Junhao ^{[1
]}

Mwakapesa, Deborah Simon ^{[1
]}

Nanehkaran, Yaser Ahangari ^{[1
]}

Chi, Zhang ^{[1
]}

Deng, Xiaoheng ^{[2
]}

Chen, Zhigang ^{[2
]}

机构：

[1] Jiangxi Univ Sci & Technol, Sch Informat Engn, Ganzhou 341000, Jiangxi, Peoples R China

[2] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Hunan, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2021年 / 27卷 / 04期

基金：

中国国家自然科学基金;

关键词：

DiffNodeset structure; MapReduce; 2-Way comparison strategy; Load balancing strategy based on dynamic grouping; Frequent item mining;

D O I：

10.1007/s00530-020-00725-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as PFIMD algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called DiffNodeset is adopted for avoiding the increase of N-list cardinality in the MRPrePost algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the DiffNodeset generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in F-list, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of MRPrePost in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of PFIMD algorithm in several multimedia data sets are listed to illustrate its universality.

引用

页码：709 / 722

页数：14

共 50 条

[11] IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model
Liu, Sheng-Hui
Liu, Shi-Jia
Chen, Shi-Xuan
Yu, Kun-Ming
[J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1290 - 1295
[12] YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark
Qiu, Hongjian
Gu, Rong
Yuan, Chunfeng
Huang, Yihua
[J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1664 - 1671
[13] A New Parallel Algorithm for the Frequent Itemset Mining Problem
Craus, Mitica
[J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, 2008, : 165 - 170
[14] Parallel Processing of Frequent Itemset Based on MapReduce Programming Model
Deshmukh, Rajshree A.
Bharathi, H. N.
Tripathy, Amiya K.
[J]. 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
[15] A Novel Nodesets-Based Frequent Itemset Mining Algorithm for Big Data using MapReduce
Sivaiah, Borra
Rao, Ramisetty Rajeswara
[J]. INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (09) : 1051 - 1058
[16] A novel parallel frequent itemset mining algorithm for automatic enterprise
Mao, Yimin
Wu, Bin
Deng, Qianhu
Mahmoodi, Soroosh
Chen, Zhigang
Chen, Yeh-Cheng
[J]. ENTERPRISE INFORMATION SYSTEMS, 2023, 17 (10)
[17] A Novel Parallel Algorithm for Frequent Itemset Mining of Incremental Dataset
Xu, Lijun
Zhang, Yun
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 41 - 44
[18] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
Liu, Yang
Ma, Chenxiao
Xu, Lixiong
Shen, Xiaodong
Li, Maozhen
Li, Pengcheng
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
[19] SmartCache: An Optimized MapReduce Implementation of Frequent Itemset Mining
Huang, Dachuan
Song, Yang
Routray, Ramani
Qin, Feng
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 16 - 25
[20] MapReduce-based Parallelized Approximation of Frequent Itemsets Mining in Uncertain Data
Xu, Jing
Mao, Xiao-Jiao
Lu, Wen-Yang
Zhu, Qi-Hai
Li, Ning
Yang, Yu-Bin
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 2015, 9492 : 136 - 144

← 1 2 3 4 5 →