SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

被引：0

作者：

Wen Xiao

Juan Hu

机构：

[1] HOHAI University,College of Computer Science and Information

[2] Wanjiang University of Technology,Key Laboratory of Unmanned Aerial Vehicle Development and Data Application of Anhui Higher Education Institutes

[3] Wanjiang University of Technology,Ma’anshan Engineering Technology Research Center for Wireless Sensor Network and IntelliSense

来源：

The Journal of Supercomputing | 2020年 / 76卷

关键词：

Frequent itemset mining; Streaming data; Sliding window; Distributed; Spark Streaming;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

引用

页码：7619 / 7634

页数：15

共 50 条

[1] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
Xiao, Wen
Hu, Juan
[J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 7619 - 7634
[2] An algorithm for in-core frequent itemset mining on streaming data
Jin, RM
Agrawal, G
[J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 210 - 217
[3] Parallel Frequent Itemset Mining on Streaming Data
He, Yanshan
Yue, Min
[J]. 2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730
[4] Approximate Frequent Itemset Mining for Streaming Data on FPGA
Li, Yubin
Sun, Yuliang
Dai, Guohao
Xu, Qiang
Wang, Yu
Yang, Huazhong
[J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
[5] Finding tendencies in streaming data using Big Data frequent itemset mining
Fernandez-Basso, Carlos
Francisco-Agra, Abel J.
Martin-Bautista, Maria J.
Dolores Ruiz, M.
[J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 666 - 674
[6] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
Feng Zhang
Min Liu
Feng Gui
Weiming Shen
Abdallah Shami
Yunlong Ma
[J]. Cluster Computing, 2015, 18 : 1493 - 1501
[7] A distributed frequent itemset mining algorithm using Spark for Big Data analytics
Zhang, Feng
Liu, Min
Gui, Feng
Shen, Weiming
Shami, Abdallah
Ma, Yunlong
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (04): : 1493 - 1501
[8] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
Yoshitaka Yamamoto
Yasuo Tabei
Koji Iwanuma
[J]. Journal of Intelligent Information Systems, 2020, 55 : 119 - 147
[9] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
Yamamoto, Yoshitaka
Tabei, Yasuo
Iwanuma, Koji
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 55 (01) : 119 - 147
[10] An Incremental Algorithm for Frequent Itemset Mining on Spark
Yu, Min
Zuo, Chuang
Yuan, Yunpeng
Yang, Yulu
[J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 281 - 285

← 1 2 3 4 5 →