SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

被引:0
|
作者
Wen Xiao
Juan Hu
机构
[1] HOHAI University,College of Computer Science and Information
[2] Wanjiang University of Technology,Key Laboratory of Unmanned Aerial Vehicle Development and Data Application of Anhui Higher Education Institutes
[3] Wanjiang University of Technology,Ma’anshan Engineering Technology Research Center for Wireless Sensor Network and IntelliSense
来源
关键词
Frequent itemset mining; Streaming data; Sliding window; Distributed; Spark Streaming;
D O I
暂无
中图分类号
学科分类号
摘要
Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.
引用
收藏
页码:7619 / 7634
页数:15
相关论文
共 50 条
  • [1] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
    Xiao, Wen
    Hu, Juan
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 7619 - 7634
  • [2] An algorithm for in-core frequent itemset mining on streaming data
    Jin, RM
    Agrawal, G
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 210 - 217
  • [3] Parallel Frequent Itemset Mining on Streaming Data
    He, Yanshan
    Yue, Min
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730
  • [4] Approximate Frequent Itemset Mining for Streaming Data on FPGA
    Li, Yubin
    Sun, Yuliang
    Dai, Guohao
    Xu, Qiang
    Wang, Yu
    Yang, Huazhong
    [J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [5] Finding tendencies in streaming data using Big Data frequent itemset mining
    Fernandez-Basso, Carlos
    Francisco-Agra, Abel J.
    Martin-Bautista, Maria J.
    Dolores Ruiz, M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 666 - 674
  • [6] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Feng Zhang
    Min Liu
    Feng Gui
    Weiming Shen
    Abdallah Shami
    Yunlong Ma
    [J]. Cluster Computing, 2015, 18 : 1493 - 1501
  • [7] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
    Zhang, Feng
    Liu, Min
    Gui, Feng
    Shen, Weiming
    Shami, Abdallah
    Ma, Yunlong
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1493 - 1501
  • [8] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
    Yoshitaka Yamamoto
    Yasuo Tabei
    Koji Iwanuma
    [J]. Journal of Intelligent Information Systems, 2020, 55 : 119 - 147
  • [9] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
    Yamamoto, Yoshitaka
    Tabei, Yasuo
    Iwanuma, Koji
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 55 (01) : 119 - 147
  • [10] An Incremental Algorithm for Frequent Itemset Mining on Spark
    Yu, Min
    Zuo, Chuang
    Yuan, Yunpeng
    Yang, Yulu
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 281 - 285