Evaluating top-k queries over web-accessible databases

被引：133

作者：

Marian, A ^{[1
]}

Bruno, N

Gravano, L

机构：

[1] Columbia Univ, Dept Comp Sci, 1214 Amsterdam Ave, New York, NY 10027 USA

[2] Microsoft Res, Redmond, WA USA

来源：

ACM TRANSACTIONS ON DATABASE SYSTEMS | 2004年 / 29卷 / 02期

关键词：

algorithms; measurement; performance; parallel query processing; query optimization; top-k query processing; web databases;

D O I：

10.1145/1005566.1005569

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.

引用

页码：319 / 362

页数：44

共 50 条

[1] Evaluating Top-k queries over web-accessible Databases
Bruno, N
Gravano, L
Marian, A
[J]. 18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 369 - +
[2] Evaluating Top-k Skyline queries over relational databases
Brando, Carmen
Goncalves, Marlene
Gonzalez, Vanessa
[J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 254 - +
[3] Top-k queries over web applications
Daniel Deutch
Tova Milo
Neoklis Polyzotis
[J]. The VLDB Journal, 2013, 22 : 519 - 542
[4] Top-k queries over web applications
Deutch, Daniel
Milo, Tova
Polyzotis, Neoklis
[J]. VLDB JOURNAL, 2013, 22 (04): : 519 - 542
[5] Evaluating TOP-K Queries Over Business Processes
Deutch, Daniel
Milo, Tova
[J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1195 - 1198
[6] Progressive skylining over Web-accessible databases
Lo, E
Yip, KY
Lin, KP
Cheung, DW
[J]. DATA & KNOWLEDGE ENGINEERING, 2006, 57 (02) : 122 - 147
[7] Evaluating continuous top-k queries over document streams
Weixiong Rao
Lei Chen
Shudong Chen
Sasu Tarkoma
[J]. World Wide Web, 2014, 17 : 59 - 83
[8] Evaluating top-k selection queries
Chaudhuri, S
Gravano, L
[J]. PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 399 - 410
[9] Distributed probabilistic top-k dominating queries over uncertain databases
Niranjan Rai
Xiang Lian
[J]. Knowledge and Information Systems, 2023, 65 : 4939 - 4965
[10] Evaluating continuous top-k queries over document streams
Rao, Weixiong
Chen, Lei
Chen, Shudong
Tarkoma, Sasu
[J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (01): : 59 - 83

← 1 2 3 4 5 →