Monitoring Distributed Streams using Convex Decompositions

被引:15
|
作者
Lazerson, Arnon [1 ]
Sharfman, Izchak [1 ]
Keren, Daniel [2 ]
Schuster, Assaf [1 ]
Garofalakis, Minos [3 ]
Samoladas, Vasilis [3 ]
机构
[1] Israeli Inst Technol, Fac Comp Sci, Tel Aviv, Israel
[2] Univ Haifa, Dept Comp Sci, IL-31999 Haifa, Israel
[3] Tech Univ Crete, Sch Elect & Comp Engn, Iraklion, Greece
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 05期
关键词
D O I
10.14778/2735479.2735487
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the space-and time-efficiency requirements of conventional stream-processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem - this is especially true for nonlinear queries (e.g., joins), where no obvious solutions exist for distributing the monitored condition across sites. The recently proposed geometric method, based on the notion of covering spheres, offers a generic methodology for splitting an arbitrary (non-linear) global condition into a collection of local site constraints, and has been applied tomassive distributed stream-monitoring tasks, achieving state-of-the-art performance. In this paper, we present a far more general geometric approach, based on the convex decomposition of an appropriate subset of the domain of the monitoring query, and formally prove that it is always guaranteed to perform at least as good as the covering spheres method. We analyze our approach and demonstrate its effectiveness for the important case of sketch-based approximate tracking for norm, range-aggregate, and join-aggregate queries, which have numerous applications in streaming data analysis. Experimental results on real-life data streams verify the superiority of our approach in practical settings, showing that it substantially outperforms the covering spheres method.
引用
收藏
页码:545 / 556
页数:12
相关论文
共 50 条
  • [1] Lightweight Monitoring of Distributed Streams
    Lazerson, Arnon
    Keren, Daniel
    Schuster, Assaf
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2018, 43 (02):
  • [2] Lightweight Monitoring of Distributed Streams
    Lazerson, Arnon
    Keren, Daniel
    Schuster, Assaf
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1685 - 1694
  • [3] Thresholded Monitoring in Distributed Data Streams
    Li, Meng
    Dai, Haipeng
    Wang, Xiaoyu
    Xia, Rui
    Liu, Alex X.
    Chen, Guihai
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (03) : 1033 - 1046
  • [4] Thresholded Monitoring in Distributed Data Streams
    Li, Meng
    Dai, Haipeng
    Wang, Xiaoyu
    Xia, Rui
    Liu, Alex X.
    Chen, Guihai
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 218 - 227
  • [5] Convex Decompositions
    Cervone, Davide P.
    Zwicker, William S.
    JOURNAL OF CONVEX ANALYSIS, 2009, 16 (02) : 367 - 376
  • [6] Monitoring persistent items in the union of distributed streams
    Singh, Sneha Aman
    Tirthapura, Srikanta
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (11) : 3115 - 3127
  • [7] Monitoring Least Squares Models of Distributed Streams
    Gabel, Moshe
    Keren, Daniel
    Schuster, Assaf
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 319 - 328
  • [8] On convex decompositions of points
    Hosono, K
    Rappaport, D
    Urabe, M
    DISCRETE AND COMPUTATIONAL GEOMETRY, 2001, 2098 : 149 - 155
  • [9] Monitoring Distributed Data Streams through Node Clustering
    Barouti, Maria
    Keren, Daniel
    Kogan, Jacob
    Malinovsky, Yaakov
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014, 2014, 8556 : 149 - 162
  • [10] Continuous Skyline Monitoring over Distributed Data Streams
    Lu, Hua
    Zhou, Yongluan
    Haustad, Jonas
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 565 - +