Parallel Star Join plus DataIndexes: Efficient query processing in data warehouses and OLAP

被引:11
|
作者
Datta, A [1 ]
VanderMeer, D
Ramamritham, K
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Indian Inst Technol, Bombay 400076, Maharashtra, India
关键词
parallel star join; OLAP; query processing; dataindexes;
D O I
10.1109/TKDE.2002.1047769
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
On-Line Analytical Processing (OLAP) refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision-support purposes. Data warehouses tend to be extremely large-it is quite possible for a data warehouse to be hundreds of gigabytes to terabytes in size [3]. Queries tend to be complex and ad hoc, often requiring computationally expensive operations such as joins and aggregation. Given this, we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques. In particular, we exploit the natural partitionability of a star schema and render it even more efficient by applying DataIndexes-a storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data. Dataindexes are derived from the various special purpose access mechanisms currently supported in commercial OLAP products. Specifically, we propose a declustering strategy which incorporates both task and data partitioning and present the Parallel Star Join (PSJ) Algorithm, which provides a means to perform a star join in parallel using efficient operations involving only rowsets and projection columns. We compare the performance of the PSJ Algorithm with two parallel query processing strategies. The first is a parallel join strategy utilizing the Bitmap Join Index (BJI), arguably the state-of-the-art OLAP join structure in use today. For the second strategy we choose a well-known parallel join algorithm, namely the pipelined hash algorithm. To assist in the performance comparison, we first develop a cost model of the disk access and transmission costs for all three approaches. Performance comparisons show that the Dataindex-based approach leads to dramatically lower disk access costs than the BJI, as well as the hybrid hash approaches, in both speedup and scaleup experiments, while the hash-based approach outperforms the BJI in disk access costs. With regard to transmission overhead, our performance results show that PSJ and BJI outperform the hash-based approach. Overall, our parallel star join algorithm and dataindexes form a winning combination.
引用
收藏
页码:1299 / 1316
页数:18
相关论文
共 50 条
  • [1] Efficient OLAP query processing in distributed data warehouses
    Akinde, M
    Böhlen, M
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 262 - 262
  • [2] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    [J]. ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 336 - 353
  • [3] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    [J]. INFORMATION SYSTEMS, 2003, 28 (1-2) : 111 - 135
  • [4] XML-based OLAP query processing in a federated data warehouses
    Mangisengi, O
    Essmayr, W
    Huber, J
    Weippl, E
    [J]. ENTERPRISE INFORMATION SYSTEMS V, 2004, : 93 - 100
  • [5] On Index Structures for Star Query Processing in Data Warehouses
    Wojciechowski, Artur
    Wrembel, Robert
    [J]. BUSINESS INTELLIGENCE, EBISS 2013, 2014, 172 : 182 - 217
  • [6] Range sum query processing in parallel data warehouses
    Li, JZ
    Gao, H
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 877 - 881
  • [7] Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks
    Han, Hyuck
    Jung, Hyungsoo
    Eom, Hyeonsang
    Yeom, Heon Y.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2011, 14 (02): : 183 - 197
  • [8] Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks
    Hyuck Han
    Hyungsoo Jung
    Hyeonsang Eom
    Heon Y. Yeom
    [J]. Cluster Computing, 2011, 14 : 183 - 197
  • [9] Efficient of bitmap join indexes for optimising star join queries in relational data warehouses
    Yahyaoui, Mohammed
    Amjad, Souad
    Benameur, Lamia
    Jellouli, Ismail
    [J]. International Journal of Computational Intelligence Studies, 2020, 9 (03) : 220 - 233
  • [10] Parallel OLAP query processing in database clusters with data replication
    Alexandre A. B. Lima
    Camille Furtado
    Patrick Valduriez
    Marta Mattoso
    [J]. Distributed and Parallel Databases, 2009, 25 : 97 - 123