Sliding windows over uncertain data streams

被引:13
|
作者
Dallachiesa, Michele [1 ,2 ]
Jacques-Silva, Gabriela [2 ]
Gedik, Bugra [3 ]
Wu, Kun-Lung [2 ]
Palpanas, Themis [1 ,4 ]
机构
[1] Univ Trento, Trento, Italy
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[3] Bilkent Univ, Ankara, Turkey
[4] Paris Descartes Univ, Paris, France
关键词
Data stream processing; Sliding windows; Uncertainty management;
D O I
10.1007/s10115-014-0804-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Uncertain data streams can have tuples with both value and existential uncertainty. A tuple has value uncertainty when it can assume multiple possible values. A tuple is existentially uncertain when the sum of the probabilities of its possible values is 1. A situation where existential uncertainty can arise is when applying relational operators to streams with value uncertainty. Several prior works have focused on querying and mining data streams with both value and existential uncertainty. However, none of them have studied, in depth, the implications of existential uncertainty on sliding window processing, even though it naturally arises when processing uncertain data. In this work, we study the challenges arising from existential uncertainty, more specifically the management of count-based sliding windows, which are a basic building block of stream processing applications. We extend the semantics of sliding window to define the novel concept of uncertain sliding windows and provide both exact and approximate algorithms for managing windows under existential uncertainty. We also show how current state-of-the-art techniques for answering similarity join queries can be easily adapted to be used with uncertain sliding windows. We evaluate our proposed techniques under a variety of configurations using real data. The results show that the algorithms used to maintain uncertain sliding windows can efficiently operate while providing a high-quality approximation in query answering. In addition, we show that sort-based similarity join algorithms can perform better than index-based techniques (on 17 real datasets) when the number of possible values per tuple is low, as in many real-world applications.
引用
收藏
页码:159 / 190
页数:32
相关论文
共 50 条
  • [41] Find recent frequent items with sliding windows in data streams
    Ren, Jiadong
    Li, Ke
    [J]. 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS, 2007, : 625 - 628
  • [42] Partition-Based Clustering with Sliding Windows for Data Streams
    Youn, Jonghem
    Choi, Jihun
    Shim, Junho
    Lee, Sang-goo
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 289 - 303
  • [43] A basic-window based priority-sample algorithm for sliding windows over data streams
    Zhang, Longbo
    Li, Zhanhuai
    Yu, Min
    Jiang, Yun
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, : 316 - 319
  • [44] RLC: ranking lag correlations with flexible sliding windows in data streams
    Wu, Shanshan
    Lin, Huaizhong
    Wang, Wenxiang
    Lu, Dongming
    U, Leong Hou
    Gao, Yunjun
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (02) : 601 - 611
  • [45] An EM-Based Algorithm for Clustering Data Streams in Sliding Windows
    Dang, Xuan Hong
    Lee, Vincent
    Ng, Wee Keong
    Ciptadi, Arridhang
    Ong, Kok Leong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 230 - +
  • [46] RLC: ranking lag correlations with flexible sliding windows in data streams
    Shanshan Wu
    Huaizhong Lin
    Wenxiang Wang
    Dongming Lu
    Leong Hou U
    Yunjun Gao
    [J]. Pattern Analysis and Applications, 2017, 20 : 601 - 611
  • [47] Efficient pattern matching over uncertain data streams
    Lian, Xiang
    Lei, Chen
    [J]. HKIE Transactions Hong Kong Institution of Engineers, 2009, 16 (04): : 10 - 19
  • [48] STAGGER: Periodicity mining of data streams using expanding sliding windows
    Elfeky, Mohamed G.
    Aref, Walid G.
    Elmagarmid, Ahmed K.
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 188 - +
  • [49] Continuous monitoring of skylines over uncertain data streams
    Ding, Xiaofeng
    Lian, Xiang
    Chen, Lei
    Jin, Hai
    [J]. INFORMATION SCIENCES, 2012, 184 (01) : 196 - 214
  • [50] Distributed streams algorithms for sliding windows
    Gibbons, PB
    Tirthapura, S
    [J]. THEORY OF COMPUTING SYSTEMS, 2004, 37 (03) : 457 - 478