Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows

被引:1
|
作者
沈鸿 [1 ,2 ]
张育 [1 ]
机构
[1] Department of Computer Science and Technology,University of Science and Technology of China
[2] School of Computer Science,University of Adelaide
基金
中国国家自然科学基金;
关键词
data stream; duplicate detection; bloom filter; approximate query; sliding window;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general,precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios,and,on the other hand,the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper,we present a novel data structure,Decaying Bloom Filter(DBF),as an extension of the Counting Bloom Filter,that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors,but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy,and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W,our algorithm has an amortized time complexity of O((G/W))1/2. Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results.
引用
收藏
页码:973 / 987
页数:15
相关论文
共 50 条
  • [1] Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows
    Shen, Hong
    Zhang, Yu
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2008, 23 (06): : 973 - 987
  • [2] Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows
    Hong Shen
    Yu Zhang
    [J]. Journal of Computer Science and Technology, 2008, 23 : 973 - 987
  • [3] Outlier Detection over Sliding Windows for Probabilistic Data Streams
    Bin Wang
    Xiao-Chun Yang
    Guo-Ren Wang
    Ge Yu
    [J]. Journal of Computer Science and Technology, 2010, 25 : 389 - 400
  • [4] Outlier Detection over Sliding Windows for Probabilistic Data Streams
    王斌
    杨晓春
    王国仁
    于戈
    [J]. Journal of Computer Science & Technology, 2010, 25 (03) : 389 - 400
  • [5] Outlier Detection over Sliding Windows for Probabilistic Data Streams
    Wang, Bin
    Yang, Xiao-Chun
    Wang, Guo-Ren
    Yu, Ge
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (03): : 389 - 400
  • [6] Approximate Range Emptiness in Constant Time for IoT Data Streams over Sliding Windows
    Wang, Xiujun
    Liu, Zhi
    Yang, Yangzhao
    Shao, Xun
    Gu, Yu
    Ishihara, Susumu
    [J]. 2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2019,
  • [7] Sliding windows over uncertain data streams
    Dallachiesa, Michele
    Jacques-Silva, Gabriela
    Gedik, Bugra
    Wu, Kun-Lung
    Palpanas, Themis
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (01) : 159 - 190
  • [8] Sliding windows over uncertain data streams
    Michele Dallachiesa
    Gabriela Jacques-Silva
    Buğra Gedik
    Kun-Lung Wu
    Themis Palpanas
    [J]. Knowledge and Information Systems, 2015, 45 : 159 - 190
  • [9] RETRACTED: Improved Decaying Bloom Filter for Duplicate Detection in Data Streams Over Sliding Windows (Retracted Article)
    Wang, Xiujun
    Shen, Hong
    [J]. ICCSIT 2010 - 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 4, 2010, : 348 - 353
  • [10] Sketching asynchronous data streams over sliding windows
    Bojian Xu
    Srikanta Tirthapura
    Costas Busch
    [J]. Distributed Computing, 2008, 20 : 359 - 374