Sliding windows over uncertain data streams

被引:13
|
作者
Dallachiesa, Michele [1 ,2 ]
Jacques-Silva, Gabriela [2 ]
Gedik, Bugra [3 ]
Wu, Kun-Lung [2 ]
Palpanas, Themis [1 ,4 ]
机构
[1] Univ Trento, Trento, Italy
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[3] Bilkent Univ, Ankara, Turkey
[4] Paris Descartes Univ, Paris, France
关键词
Data stream processing; Sliding windows; Uncertainty management;
D O I
10.1007/s10115-014-0804-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Uncertain data streams can have tuples with both value and existential uncertainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is 1. A situation where existential uncertainty can arise is when applying relational operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. However, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high-quality approximation in query answering. In addition, we show that sort-based similarity join algorithms can perform better than index-based techniques (on 17 real datasets) when the number of possible values per tuple is low, as in many real-world applications.
引用
收藏
页码:159 / 190
页数:32
相关论文
共 50 条
  • [31] Concept Shift Detection for Frequent Itemsets from Sliding Windows over Data Streams
    Koh, Jia-Ling
    Lin, Ching-Yi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2009, 5667 : 334 - 348
  • [32] StreamSW: A density-based approach for clustering data streams over sliding windows
    Reddy, K. Shyam Sunder
    Bindu, C. Shoba
    [J]. MEASUREMENT, 2019, 144 : 14 - 19
  • [33] Truly Perfect Samplers for Data Streams and Sliding Windows
    Jayaram, Rajesh
    Woodruff, David P.
    Zhou, Samson
    [J]. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2022, : 29 - 40
  • [34] Truly Perfect Samplers for Data Streams and Sliding Windows
    Jayaram, Rajesh
    Woodruff, David P.
    Zhou, Samson
    [J]. PROCEEDINGS OF THE 41ST ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '22), 2022, : 29 - 40
  • [35] Mining frequent itemsets over data streams with multiple time-sensitive sliding windows
    Jin, Long
    Chai, Duck Jin
    Lee, Yang Koo
    Ryu, Keun Ho
    [J]. ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 486 - +
  • [36] CVS: Fast cardinality estimation for large-scale data streams over sliding windows
    Shan, Jingsong
    Luo, Jianxin
    Ni, Guiqiang
    Wu, Zhaofeng
    Duan, Weiwei
    [J]. NEUROCOMPUTING, 2016, 194 : 107 - 116
  • [37] Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index
    Xiaoyong Li
    Yijie Wang
    Xiaoling Li
    Yuan Wang
    [J]. Knowledge and Information Systems, 2014, 41 : 277 - 309
  • [38] Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index
    Li, Xiaoyong
    Wang, Yijie
    Li, Xiaoling
    Wang, Yuan
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (02) : 277 - 309
  • [40] PROBABILISTIC QUERYING OVER UNCERTAIN DATA STREAMS
    Dezfuli, Mohammad G.
    Haghjoo, Mostafa S.
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2012, 20 (05) : 701 - 728