A Shared Execution Strategy for Multiple Pattern Mining Requests over Streaming Data

被引：0

作者：

Yang, Di ^{[1
]}

Rundensteiner, Elke A. ^{[1
]}

Ward, Matthew O. ^{[1
]}

机构：

[1] Worcester Polytech Inst, Comp Sci Dept, Worcester, MA 01609 USA

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2009年 / 2卷 / 01期

关键词：

D O I：

10.14778/1687627.1687726

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In diverse applications ranging from stock trading to traffic monitoring, popular data streams are typically monitored by multiple analysts for patterns of interest. These analysts may submit similar pattern mining requests, such as cluster detection queries, yet customized with different parameter settings. In this work, we present an efficient shared execution strategy for processing a large number of density-based cluster detection queries with arbitrary parameter settings. Given the high algorithmic complexity of the clustering process and the real-time responsiveness required by streaming applications, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining clusters for different queries independently is often infeasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. To overcome this, we analyze the interrelations between the cluster sets identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. We introduce the notion of the growth property among the cluster sets identified by different queries, and characterize the conditions under which it holds. By exploiting this growth property we propose a uniform solution, called Chandi, which represents identified cluster sets as one single compact structure and performs integrated maintenance on them -resulting in significant sharing of computational and memory resources. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that Chandi is on average four times faster than the best alternative methods, while using 85% less memory space in our test cases. It also shows that Chandi scales in handling large numbers of queries on the order of hundreds or even thousands under high input data rates.

引用

页码：874 / 885

页数：12

共 50 条

[1] Shared Execution Strategy for Neighbor-Based Pattern Mining Requests over Streaming Windows
Yang, Di
Rundensteiner, Elke A.
Ward, Matthew O.
[J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (01):
[2] Replica Scheduling Strategy for Streaming Data Mining
Li, Shufan
Yu, Siyuan
Xiao, Fang
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 10 - 19
[3] Shared Execution Techniques for Business Data Analytics over Big Data Streams
Uzunbaz, Serkan
Aref, Walid G.
[J]. PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,
[4] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
Wen Xiao
Juan Hu
[J]. The Journal of Supercomputing, 2020, 76 : 7619 - 7634
[5] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
Xiao, Wen
Hu, Juan
[J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 7619 - 7634
[6] Secure and unifold mining model for pattern discovery from streaming data
Rao, Annaluri Sreenivasa
Ramana, Attili Venkata
Prasad, Kalli Srinivasa Nageswara
[J]. Rao, Annaluri Sreenivasa (annaluri.rao@gmail.com), 1600, Inderscience Publishers (14): : 136 - 145
[7] A New Algorithm of Mining High Utility Sequential Pattern in Streaming Data
Tang, Huijun
Liu, Yangguang
Wang, Le
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (01) : 342 - 350
[8] A Multi - flow Streaming Data Frequent Pattern Mining Adaptive Algorithm
Feng, Fan
Liao, Husheng
Jin, Xueyun
[J]. 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT 2017), 2017, : 142 - 149
[9] A New Algorithm of Mining High Utility Sequential Pattern in Streaming Data
Huijun Tang
Yangguang Liu
Le Wang
[J]. International Journal of Computational Intelligence Systems, 2018, 12 (1) : 342 - 350
[10] A Comparative Analysis of Frequent Pattern Mining Algorithms Used for Streaming Data
Shalini
Jain, Sanjay Kumar
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 250 - 255

← 1 2 3 4 5 →