Evaluating top-k queries over web-accessible databases

被引:133
|
作者
Marian, A [1 ]
Bruno, N
Gravano, L
机构
[1] Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA
[2] Microsoft Res, Redmond, WA USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2004年 / 29卷 / 02期
关键词
algorithms; measurement; performance; parallel query processing; query optimization; top-k query processing; web databases;
D O I
10.1145/1005566.1005569
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.
引用
收藏
页码:319 / 362
页数:44
相关论文
共 50 条
  • [1] Evaluating Top-k queries over web-accessible Databases
    Bruno, N
    Gravano, L
    Marian, A
    [J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 369 - +
  • [2] Evaluating Top-k Skyline queries over relational databases
    Brando, Carmen
    Goncalves, Marlene
    Gonzalez, Vanessa
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 254 - +
  • [3] Top-k queries over web applications
    Daniel Deutch
    Tova Milo
    Neoklis Polyzotis
    [J]. The VLDB Journal, 2013, 22 : 519 - 542
  • [4] Top-k queries over web applications
    Deutch, Daniel
    Milo, Tova
    Polyzotis, Neoklis
    [J]. VLDB JOURNAL, 2013, 22 (04): : 519 - 542
  • [5] Evaluating TOP-K Queries Over Business Processes
    Deutch, Daniel
    Milo, Tova
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1195 - 1198
  • [6] Progressive skylining over Web-accessible databases
    Lo, E
    Yip, KY
    Lin, KP
    Cheung, DW
    [J]. DATA & KNOWLEDGE ENGINEERING, 2006, 57 (02) : 122 - 147
  • [7] Evaluating continuous top-k queries over document streams
    Weixiong Rao
    Lei Chen
    Shudong Chen
    Sasu Tarkoma
    [J]. World Wide Web, 2014, 17 : 59 - 83
  • [8] Evaluating top-k selection queries
    Chaudhuri, S
    Gravano, L
    [J]. PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 399 - 410
  • [9] Distributed probabilistic top-k dominating queries over uncertain databases
    Niranjan Rai
    Xiang Lian
    [J]. Knowledge and Information Systems, 2023, 65 : 4939 - 4965
  • [10] Evaluating continuous top-k queries over document streams
    Rao, Weixiong
    Chen, Lei
    Chen, Shudong
    Tarkoma, Sasu
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (01): : 59 - 83