MUD: Mapping-based query processing for high-dimensional uncertain data

被引:5
|
作者
Shou, Lidan [1 ]
Zhang, Xiaolong [1 ]
Chen, Gang [1 ]
Gao, Yuan [1 ]
Chen, Ke [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
基金
美国国家科学基金会;
关键词
High-dimensional uncertain data; Probabilistic threshold range query; NEAREST-NEIGHBOR QUERIES; RANKING; SEARCH;
D O I
10.1016/j.ins.2012.02.023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world applications require management of uncertain data that are modeled as objects in high-dimensional space with imprecise values. In such applications, data objects are typically associated with probability density functions. A fundamental operation on such uncertain data is the probabilistic-threshold range query (PTRQ), which retrieves the objects appearing in the query region with probabilities no less than a specified value. In this paper, we propose a novel framework called MUD for efficient processing of PTRQs on high-dimensional uncertain data. We first propose a cost-effective pruning technique based on a very simple form of probabilistic pruning information (PPI), namely the probabilistic quantiles. Then we map high-dimensional uncertain objects to a single-dimensional space, where the quantiles of uncertain objects can be indexed using the existing single-dimensional indices such as the B+-tree. Each PTRQ in the high-dimensional space is transformed into multiple range queries on the single-dimensional space and evaluated there. We also discuss a method to optimize the indexing scheme for MUD. Specifically, we formulate a mathematical model for measuring the "pruning power" of quantiles, and propose a dynamic programming algorithm which selects the "best" quantiles for mapping and indexing. We perform extensive experiments on both synthetic and real data sets. Our experimental results reveal that the MUD framework is both effective and efficient for processing PTRQs on high-dimensional uncertain data, and it can significantly outperform state-of-the-art schemes. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:147 / 168
页数:22
相关论文
共 50 条
  • [1] Similarity Query Processing for High-Dimensional Data
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3437 - 3440
  • [2] High-Dimensional Similarity Query Processing for Data Science
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    Wang, Yaoshu
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4062 - 4063
  • [3] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
  • [4] PROM: Efficient matching query processing on high-dimensional data
    Ma, Chunyang
    Zhou, Yongluan
    Shou, Lidan
    Chen, Gang
    [J]. INFORMATION SCIENCES, 2015, 322 : 1 - 19
  • [5] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
  • [6] An efficient algorithm for hyperspherical range query processing in high-dimensional data space
    Lee, DH
    Heu, S
    Kim, HJ
    [J]. INFORMATION PROCESSING LETTERS, 2002, 83 (02) : 115 - 123
  • [7] Privacy-Preserving Range Query for High-dimensional Uncertain Data in A Two-party Scenario
    Su Shenghao
    Guo Cheng
    Tian Pengxu
    Tang Xinyu
    [J]. 2021 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (DSC), 2021,
  • [8] From ambiguities to insights: Query-based comparisons of high-dimensional data
    Kowalski, Jeanne
    Talbot, Conover
    Tsai, Hua L.
    Prasad, Nijaguna
    Umbricht, Christopher, Jr.
    Zeiger, Martha A.
    [J]. COMPUTATIONAL MODELS FOR LIFE SCIENCES (CMLS 07), 2007, 952 : 305 - +
  • [9] A Δ-tree based similarity join processing for high-dimensional data
    Liu, Yan
    Hao, Zhongxiao
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (06): : 995 - 1002
  • [10] A Performance Analysis of Prediction Techniques in Handling High-Dimensional Uncertain Data for the Application of Skyline Query Over Data Stream
    Mohamud, Mudathir Ahmed
    Ibrahim, Hamidah
    Sidi, Fatimah
    Rum, Siti Nurulain Mohd
    Dzolkhifli, Zarina Binti
    Xiaowei, Zhang
    [J]. IEEE ACCESS, 2024, 12 : 120877 - 120898