Efficient parallel processing of range queries through replicated declustering

被引:17
|
作者
Ferhatosmanoglu, Hakan
Tosun, Ali Saman
Canahuate, Guadalupe [1 ]
Ramachandran, Aravind
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Univ Texas, Dept Comp Sci, San Antonio, TX 78249 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
基金
美国国家科学基金会;
关键词
declustering; replication; parallel access; range queries; periodic allocation; optimal parallel processing; replicated declustering;
D O I
10.1007/s10619-006-9362-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.
引用
收藏
页码:117 / 147
页数:31
相关论文
共 50 条
  • [1] Efficient parallel processing of range queries through replicated declustering
    Hakan Ferhatosmanoglu
    Ali Şaman Tosun
    Guadalupe Canahuate
    Aravind Ramachandran
    [J]. Distributed and Parallel Databases, 2006, 20 : 117 - 147
  • [2] Selective Replicated Declustering for Arbitrary Queries
    Oktay, K. Yasin
    Turk, Ata
    Aykanat, Cevdet
    [J]. EURO-PAR 2009: PARALLEL PROCESSING, PROCEEDINGS, 2009, 5704 : 375 - 386
  • [3] A hierarchical technique for constructing efficient declustering schemes for range queries
    Bhatia, R
    Sinha, RK
    Chen, CM
    [J]. COMPUTER JOURNAL, 2003, 46 (04): : 358 - 377
  • [4] Hierarchical declustering schemes for range queries
    Bhatia, R
    Sinha, RK
    Chen, CM
    [J]. ADVANCES IN DATABASE TECHNOLOGY-DEBT 2000, PROCEEDINGS, 2000, 1777 : 525 - 537
  • [5] Asymptotically optimal declustering schemes for range queries
    Sinha, RK
    Bhatia, R
    Chen, CM
    [J]. DATABASE THEORY - ICDT 2001, PROCEEDINGS, 2001, 1973 : 144 - 158
  • [6] Efficient Parallel Processing for KNN Queries
    Jiang, Tao
    Zhang, Bin
    Yu, Fahong
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INDUSTRIAL DESIGN ENGINEERING (ICIDE 2017), 2017, : 88 - 94
  • [7] Scalability analysis of declustering methods for multidimensional range queries
    Moon, BK
    Saltz, JH
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) : 310 - 327
  • [8] From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries
    Chen, CM
    [J]. JOURNAL OF THE ACM, 2004, 51 (01) : 46 - 73
  • [9] An efficient parallel processing method for skyline queries in MapReduce
    Kim, Junsu
    Kim, Myoung Ho
    [J]. JOURNAL OF SUPERCOMPUTING, 2018, 74 (02): : 886 - 935
  • [10] Efficient Parallel Processing of Analytical Queries on Linked Data
    Hagedorn, Stefan
    Sattler, Kai-Uwe
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 CONFERENCES, 2013, 8185 : 452 - 469