A Shared Execution Strategy for Multiple Pattern Mining Requests over Streaming Data

被引:0
|
作者
Yang, Di [1 ]
Rundensteiner, Elke A. [1 ]
Ward, Matthew O. [1 ]
机构
[1] Worcester Polytech Inst, Comp Sci Dept, Worcester, MA 01609 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2009年 / 2卷 / 01期
关键词
D O I
10.14778/1687627.1687726
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In diverse applications ranging from stock trading to traffic monitoring, popular data streams are typically monitored by multiple analysts for patterns of interest. These analysts may submit similar pattern mining requests, such as cluster detection queries, yet customized with different parameter settings. In this work, we present an efficient shared execution strategy for processing a large number of density-based cluster detection queries with arbitrary parameter settings. Given the high algorithmic complexity of the clustering process and the real-time responsiveness required by streaming applications, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining clusters for different queries independently is often infeasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. To overcome this, we analyze the interrelations between the cluster sets identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. We introduce the notion of the growth property among the cluster sets identified by different queries, and characterize the conditions under which it holds. By exploiting this growth property we propose a uniform solution, called Chandi, which represents identified cluster sets as one single compact structure and performs integrated maintenance on them -resulting in significant sharing of computational and memory resources. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that Chandi is on average four times faster than the best alternative methods, while using 85% less memory space in our test cases. It also shows that Chandi scales in handling large numbers of queries on the order of hundreds or even thousands under high input data rates.
引用
收藏
页码:874 / 885
页数:12
相关论文
共 50 条
  • [1] Shared Execution Strategy for Neighbor-Based Pattern Mining Requests over Streaming Windows
    Yang, Di
    Rundensteiner, Elke A.
    Ward, Matthew O.
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (01):
  • [2] Replica Scheduling Strategy for Streaming Data Mining
    Li, Shufan
    Yu, Siyuan
    Xiao, Fang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 10 - 19
  • [3] Shared Execution Techniques for Business Data Analytics over Big Data Streams
    Uzunbaz, Serkan
    Aref, Walid G.
    [J]. PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,
  • [4] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
    Wen Xiao
    Juan Hu
    [J]. The Journal of Supercomputing, 2020, 76 : 7619 - 7634
  • [5] SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
    Xiao, Wen
    Hu, Juan
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 7619 - 7634
  • [6] Secure and unifold mining model for pattern discovery from streaming data
    Rao, Annaluri Sreenivasa
    Ramana, Attili Venkata
    Prasad, Kalli Srinivasa Nageswara
    [J]. Rao, Annaluri Sreenivasa (annaluri.rao@gmail.com), 1600, Inderscience Publishers (14): : 136 - 145
  • [7] A New Algorithm of Mining High Utility Sequential Pattern in Streaming Data
    Tang, Huijun
    Liu, Yangguang
    Wang, Le
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (01) : 342 - 350
  • [8] A Multi - flow Streaming Data Frequent Pattern Mining Adaptive Algorithm
    Feng, Fan
    Liao, Husheng
    Jin, Xueyun
    [J]. 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT 2017), 2017, : 142 - 149
  • [9] A New Algorithm of Mining High Utility Sequential Pattern in Streaming Data
    Huijun Tang
    Yangguang Liu
    Le Wang
    [J]. International Journal of Computational Intelligence Systems, 2018, 12 (1) : 342 - 350
  • [10] A Comparative Analysis of Frequent Pattern Mining Algorithms Used for Streaming Data
    Shalini
    Jain, Sanjay Kumar
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 250 - 255