Approximate distributed top-k queries

被引:5
|
作者
Patt-Shamir, Boaz [1 ]
Shafrir, Allon [1 ]
机构
[1] Tel Aviv Univ, Dept Elect Engn, IL-69978 Tel Aviv, Israel
关键词
distributed algorithms; aggregate queries; communication complexity; sensor networks; random sampling;
D O I
10.1007/s00446-008-0055-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node.
引用
收藏
页码:1 / 22
页数:22
相关论文
共 50 条
  • [1] Approximate distributed top-k queries
    Boaz Patt-Shamir
    Allon Shafrir
    [J]. Distributed Computing, 2008, 21 : 1 - 22
  • [2] Optimizing Distributed Top-k Queries
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 337 - +
  • [3] Approximate top-k queries in sensor networks
    Patt-Shamir, Boaz
    Shafrir, Allon
    [J]. STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, PROCEEDINGS, 2006, 4056 : 319 - +
  • [4] Efficient processing of distributed top-k queries
    Yu, HL
    Li, HG
    Wu, P
    Agrawal, D
    El Abbadi, A
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 65 - 74
  • [5] Distributed top-k aggregation queries at large
    Thomas Neumann
    Matthias Bender
    Sebastian Michel
    Ralf Schenkel
    Peter Triantafillou
    Gerhard Weikum
    [J]. Distributed and Parallel Databases, 2009, 26 : 3 - 27
  • [6] Distributed top-k aggregation queries at large
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2009, 26 (01) : 3 - 27
  • [7] Top-k Approximate Answers to XPath Queries with Negation
    Fazzinga, Bettina
    Flesca, Sergio
    Pugliese, Andrea
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (10) : 2561 - 2573
  • [8] Finding Top-k Approximate Answers to Path Queries
    Hurtado, Carlos A.
    Poulovassilis, Alexandra
    Wood, Peter T.
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS: 8TH INTERNATIONAL CONFERENCE, FQAS 2009, 2009, 5822 : 465 - 476
  • [9] Lightweight Approximate Top-k for Distributed Settings
    Deolalikar, Vinay
    Eshghi, Kave
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 835 - 844
  • [10] Top-k vectorial aggregation queries in a distributed environment
    Sagy, Guy
    Sharfman, Izchak
    Keren, Daniel
    Schuster, Assaf
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 302 - 315