A distributed frequent itemset mining algorithm using Spark for Big Data analytics

被引:60
|
作者
Zhang, Feng [1 ,3 ]
Liu, Min [1 ]
Gui, Feng [1 ]
Shen, Weiming [2 ]
Shami, Abdallah [3 ]
Ma, Yunlong [1 ]
机构
[1] Tongji Univ, Sch Elect & Informat Engn, Shanghai 201804, Peoples R China
[2] Tongji Univ, Key Lab Embedded Syst & Serv Comp, Shanghai 201804, Peoples R China
[3] Univ Western Ontario, Dept Elect & Comp Engn, London, ON N6A 5B9, Canada
关键词
Distributed data mining algorithm; Frequent itemset mining; Big data; Spark; MAPREDUCE;
D O I
10.1007/s10586-015-0477-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining is an essential step in the process of association rule mining. Conventional approaches for mining frequent itemsets in big data era encounter significant challenges when computing power and memory space are limited. This paper proposes an efficient distributed frequent itemset mining algorithm (DFIMA) which can significantly reduce the amount of candidate itemsets by applying a matrix-based pruning approach. The proposed algorithm has been implemented using Spark to further improve the efficiency of iterative computation. Numeric experiment results using standard benchmark datasets by comparing the proposed algorithm with the existing algorithm, parallel FP-growth, show that DFIMA has better efficiency and scalability. In addition, a case study has been carried out to validate the feasibility of DFIMA.
引用
收藏
页码:1493 / 1501
页数:9
相关论文
共 50 条
  • [1] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Feng Zhang
    Min Liu
    Feng Gui
    Weiming Shen
    Abdallah Shami
    Yunlong Ma
    [J]. Cluster Computing, 2015, 18 : 1493 - 1501
  • [2] A Distributed Frequent Itemset Mining Algorithm Based on Spark
    Gui, Feng
    Ma, Yunlong
    Zhang, Feng
    Liu, Min
    Li, Fei
    Shen, Weiming
    Bai, Hua
    [J]. PROCEEDINGS OF THE 2015 IEEE 19TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2015, : 271 - 275
  • [3] An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data
    Al-Bana, Mohamed Reda
    Farhan, Marwa Salah
    Othman, Nermin Abdelhakim
    [J]. DATA, 2022, 7 (01)
  • [4] HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing
    Sethi, Krishan Kumar
    Ramesh, Dharavath
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (08): : 3652 - 3668
  • [5] HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing
    Krishan Kumar Sethi
    Dharavath Ramesh
    [J]. The Journal of Supercomputing, 2017, 73 : 3652 - 3668
  • [6] An Incremental Algorithm for Frequent Itemset Mining on Spark
    Yu, Min
    Zuo, Chuang
    Yuan, Yunpeng
    Yang, Yulu
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 281 - 285
  • [7] Frequent Itemset Mining for Big Data in social media using ClustBigFIM algorithm
    Gole, Sheela
    Tidke, Bharat
    [J]. 2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,
  • [8] Frequent Itemset Mining for Big Data
    Moens, Sandy
    Aksehirli, Emin
    Goethals, Bart
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [9] Frequent Itemset Mining for Big Data
    Chavan, Kiran
    Kulkarni, Priyanka
    Ghodekar, Pooja
    Patil, S. N.
    [J]. 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, : 1365 - 1368
  • [10] Recommendation using Frequent Itemset Mining in Big Data
    Kunjachan, Honeytta
    Hareesh, M. J.
    Sreedevi, K. M.
    [J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 561 - 566