Similarity Join Processing on Uncertain Data Streams

被引:14
|
作者
Lian, Xiang [1 ]
Chen, Lei [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Join on uncertain data streams; adaptive superset prejoin;
D O I
10.1109/TKDE.2010.208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarity join processing in the streaming environment has many practical applications such as sensor networks, object tracking and monitoring, and so on. Previous works usually assume that stream processing is conducted over precise data. In this paper, we study an important problem of similarity join processing on stream data that inherently contain uncertainty (or called uncertain data streams), where the incoming data at each time stamp are uncertain and imprecise. Specifically, we formalize this problem as join on uncertain data streams (USJ), which can guarantee the accuracy of USJ answers over uncertain data. To tackle the challenges with respect to efficiency and effectiveness such as limited memory and small response time, we propose effective pruning methods on both object and sample levels to filter out false alarms. We integrate the proposed pruning methods into an efficient query procedure that can incrementally maintain the USJ answers. Most importantly, we further design a novel strategy, namely, adaptive superset prejoin (ASP), to maintain a superset of USJ candidate pairs. ASP is in light of our proposed formal cost model such that the average USJ processing cost is minimized. We have conducted extensive experiments to demonstrate the efficiency and effectiveness of our proposed approaches.
引用
收藏
页码:1718 / 1734
页数:17
相关论文
共 50 条
  • [1] Continuous Similarity Join on Data Streams
    Cui, Jia
    Wang, Weiping
    Meng, Dan
    Liu, Zhenyan
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 552 - 559
  • [2] Probabilistic similarity join on uncertain data
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2006, 3882 : 295 - 309
  • [3] EVIDIST: A Similarity Measure for Uncertain Data Streams
    Ferchichi, Abdelwaheb
    Gouider, Mohamed Salah
    Ben Said, Lamjed
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2015, 2015, 9375 : 113 - 120
  • [4] Efficient Join Processing Over Incomplete Data Streams
    Ren, Weilong
    Lian, Xiang
    Ghazinour, Kambiz
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 209 - 218
  • [5] Join Queries on Uncertain Data: Semantics and Efficient Processing
    Ge, Tingjian
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 697 - 708
  • [6] CLARO: modeling and processing uncertain data streams
    Tran, Thanh T. L.
    Peng, Liping
    Diao, Yanlei
    McGregor, Andrew
    Liu, Anna
    [J]. VLDB JOURNAL, 2012, 21 (05): : 651 - 676
  • [7] CLARO: modeling and processing uncertain data streams
    Thanh T. L. Tran
    Liping Peng
    Yanlei Diao
    Andrew McGregor
    Anna Liu
    [J]. The VLDB Journal, 2012, 21 : 651 - 676
  • [8] Continuous similarity join over geo-textual data streams
    Hongwei Liu
    Yongjiao Sun
    Guoren Wang
    [J]. World Wide Web, 2023, 26 : 933 - 947
  • [9] Continuous similarity join over geo-textual data streams
    Liu, Hongwei
    Sun, Yongjiao
    Wang, Guoren
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (03): : 933 - 947
  • [10] Exploring Bit Arrays for Join Processing in Spatial Data Streams
    Osborn, Wendy
    [J]. ADVANCES IN NETWORKED-BASED INFORMATION SYSTEMS, NBIS-2019, 2020, 1036 : 73 - 85