Join Queries on Uncertain Data: Semantics and Efficient Processing

被引:0
|
作者
Ge, Tingjian [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes. We observe and formalize two kinds of join operations on such data, namely v-join and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.
引用
收藏
页码:697 / 708
页数:12
相关论文
共 50 条
  • [41] An Efficient Optimization Approach for Top-k Queries on Uncertain Data
    Zhang, Zhiqiang
    Wei, Xiaoyan
    Xie, Xiaoqin
    Pan, Haiwei
    Miao, Yu
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2018, 27 (01)
  • [42] GDPS: An Efficient Approach for Skyline Queries over Distributed Uncertain Data
    Li, Xiaoyong
    Wang, Yijie
    Li, Xiaoling
    Wang, Xiaowei
    yu, Jie
    [J]. BIG DATA RESEARCH, 2014, 1 : 23 - 36
  • [43] Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data
    Ding, Xiaofeng
    Jin, Hai
    [J]. 2010 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS ICDCS 2010, 2010,
  • [44] SUBSTITUTION: An Efficient Algorithm for Probability Skyline Queries on Discrete Uncertain Data
    Ma, Zhixin
    Zhang, Qiang
    Qi, Wei
    [J]. PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1927 - 1933
  • [45] Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data
    Yiu, Man Lung
    Mamoulis, Nikos
    Dai, Xiangyuan
    Tao, Yufei
    Vaitis, Michail
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (01) : 108 - 122
  • [46] Range queries on uncertain data
    Li, Jian
    Wang, Haitao
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 609 : 32 - 48
  • [47] Ranking queries on uncertain data
    Hua, Ming
    Pei, Jian
    Lin, Xuemin
    [J]. VLDB JOURNAL, 2011, 20 (01): : 129 - 153
  • [48] Range Queries on Uncertain Data
    Li, Jian
    Wang, Haitao
    [J]. ALGORITHMS AND COMPUTATION, ISAAC 2014, 2014, 8889 : 326 - 337
  • [49] Ranking queries on uncertain data
    Ming Hua
    Jian Pei
    Xuemin Lin
    [J]. The VLDB Journal, 2011, 20 : 129 - 153
  • [50] Semantics of Ranking Queries for Probabilistic Data
    Jestes, Jeffrey
    Cormode, Graham
    Li, Feifei
    Yi, Ke
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (12) : 1903 - 1917