Partitioned optimization of complex queries

被引:9
|
作者
Chatziantoniou, Damianos
Ross, Kenneth A.
机构
[1] Athens Univ Econ & Business, Dept Management Sci & Technol, Athens 11362, Greece
[2] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
query processing; query languages; decision support queries; OLAP;
D O I
10.1016/j.is.2005.09.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires significant aggregation over different attribute sets (dimensions) of the participating relations. Such queries may take hours or days, a time period unacceptable in most cases. As a result. it is important to study these queries and identify special frequent cases that can be evaluated with specialized algorithms. Understanding complex aggregate queries leads to better execution plans and, consequently, performance. The idea of partitioning is fundamental and central in aggregate queries. This concept can be used to define a class of queries called group queries. The main characteristic of a group query is that it can be evaluated in a partitioned (or groupwise) fashion, i.e. the underlying relation(s) can be partitioned (based on a set of attributes) into disjoint groups and each group can be processed separately, possibly in parallel. For example, a query that performs a complex operation (e.g. joins and/or selections and/or aggregations) within each group is a group query. To express it in SQL, one has to join/ correlate several views and/or subqueries on the grouping attributes. A naive plan (where the joins are executed) may be very expensive, even for relatively small base relations. On the other hand, a groupwise evaluation can lead to huge performance gains. We present a syntactic criterion to identify group queries in SQL and show that every group query can be expressed in a way that satisfies this criterion. This work is based on Chatziantoniou and Ross [Querying Multiple Features of Groups in Relational Databases. in: 22nd International Conference on Very Large Databases, VLDB, 1996, pp. 295-306]. The concept of group queries is useful not only in terms of evaluation, but also in terms of analyzing a complex decision support query that aggregates over different sets of attributes. In such a case the query may be decomposable to one or more query components, where each component is a group query. This observation allows parallel execution, multi-query processing and identification of special cases. We present in this paper two algorithms to decompose a complex aggregate query to its group query components. The value of groupwise processing has been recently recognized by the research community and implemented in at least a major commercial system. To be of use however in a relational system, partitioned evaluation has to be modeled as a relational operator. We review three different approaches for such art operator and propose a generalized groupwise operator. We also perform some experiments to show that naive optimization with the new operator incorporated without taking into consideration decompositions to group query components does not always lead to the most efficient plans. An extended syntax is another way to identify special frequent cases and apply efficient algorithms. Having specific operators for common operations contributes to the succinctness and optimizability of certain queries (e.g. datacubes). An extended syntax is presented with emphasis for multi-feature queries, a frequent and practical subclass of group queries that is amenable to specialized evaluation, involving (potentially repeated) selection, grouping and aggregation over the same groups. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:248 / 282
页数:35
相关论文
共 50 条
  • [1] Algorithms for efficient processing of complex queries in node-partitioned data warehouses
    Furtado, P
    [J]. INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2004, : 117 - 122
  • [2] Query Optimization for Complex Path Queries on Data
    Wang, Hongzhi
    Li, Jianzhong
    Liu, Xianmin
    Luo, Jizhou
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 389 - 404
  • [3] Robust heuristics for scalable optimization of complex SQL queries
    Das, Gopal Chandra
    Haritsa, Jayant R.
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1256 - +
  • [4] On Complexity and Optimization of Expensive Queries in Complex Event Processing
    Zhang, Haopeng
    Diao, Yanlei
    Immerman, Neil
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 217 - 228
  • [5] Efficient GPU-accelerated Join Optimization for Complex Queries
    Mageirakos, Vasilis
    Mancini, Riccardo
    Karthik, Srinivas
    Chandra, Bikash
    Ailamaki, Anastasia
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 3190 - 3193
  • [6] Scheduling intersection queries in term partitioned inverted files
    Marin, Mauricio
    Gomez-Pantoja, Carlos
    Gonzalez, Senen
    Gil-Costa, Veronica
    [J]. EURO-PAR 2008 PARALLEL PROCESSING, PROCEEDINGS, 2008, 5168 : 434 - 443
  • [7] Optimizing Queries over Partitioned Tables in MPP Systems
    Antova, Lyublena
    El-Helw, Amr
    Soliman, Mohamed A.
    Gu, Zhongxian
    Petropoulos, Michalis
    Waas, Florian
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 373 - 384
  • [8] Optimization and Execution of Complex Scientific Queries over Uncorrelated Experimental Data
    Fomkin, Ruslan
    Risch, Tore
    [J]. SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 320 - 338
  • [9] Efficient Load Balancing in Partitioned Queries Under Random Perturbations
    Gounaris, Anastasios
    Yfoulis, Christos A.
    Paton, Norman W.
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2012, 7 (01)
  • [10] OPTIMIZATION OF QUASICONJUNCTIVE QUERIES
    ZMITROVICH, AI
    THO, DS
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 1990, 16 (04) : 167 - 170