Distributed top-k aggregation queries at large

被引:12
|
作者
Neumann, Thomas [1 ]
Bender, Matthias [1 ]
Michel, Sebastian [2 ]
Schenkel, Ralf [1 ,3 ]
Triantafillou, Peter [4 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[3] Univ Saarland, D-6600 Saarbrucken, Germany
[4] Univ Patras, Patras, Greece
关键词
Top-k; Distributed queries; Query optimization; Cost models; SELECTION QUERIES;
D O I
10.1007/s10619-009-7041-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [1] Distributed top-k aggregation queries at large
    Thomas Neumann
    Matthias Bender
    Sebastian Michel
    Ralf Schenkel
    Peter Triantafillou
    Gerhard Weikum
    [J]. Distributed and Parallel Databases, 2009, 26 : 3 - 27
  • [2] Top-k vectorial aggregation queries in a distributed environment
    Sagy, Guy
    Sharfman, Izchak
    Keren, Daniel
    Schuster, Assaf
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 302 - 315
  • [3] Top-K Aggregation Queries Over Large Networks
    Yan, Xifeng
    He, Bin
    Zhu, Feida
    Han, Jiawei
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 377 - 380
  • [4] Approximate distributed top-k queries
    Boaz Patt-Shamir
    Allon Shafrir
    [J]. Distributed Computing, 2008, 21 : 1 - 22
  • [5] Optimizing Distributed Top-k Queries
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 337 - +
  • [6] Approximate distributed top-k queries
    Patt-Shamir, Boaz
    Shafrir, Allon
    [J]. DISTRIBUTED COMPUTING, 2008, 21 (01) : 1 - 22
  • [7] Efficient processing of distributed top-k queries
    Yu, HL
    Li, HG
    Wu, P
    Agrawal, D
    El Abbadi, A
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 65 - 74
  • [8] Secure Distributed Top-k Aggregation
    Jonsson, Kristjan V.
    Palmskog, Karl
    Vigfusson, Ymir
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2012,
  • [9] Processing top-k queries in distributed hash tables
    Akbarinia, Reza
    Pacitti, Esther
    Valduriez, Patrick
    [J]. EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 489 - +
  • [10] Algebraic query optimization for distributed top-k queries
    Neumann, Thomas
    Michel, Sebastian
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2007, 21 (3-4): : 197 - 211