A practical approach for efficiently answering top-k relational queries

被引:6
|
作者
Ayanso, Anteneh
Goes, Paulo B.
Mehta, Kumar
机构
[1] George Mason Univ, Sch Management, Decis Sci & Management Informat Syst, Fairfax, VA 22030 USA
[2] Brock Univ, Dept Finance Operat & Informat Syst, St Catharines, ON L2S 3A1, Canada
[3] Univ Connecticut, Dept Operat & Informt Management, Storrs, CT 06269 USA
关键词
similarity search; top-k query; uncertainty modeling; RDBMSs;
D O I
10.1016/j.dss.2007.04.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An increasing number of application areas now rely on obtaining the "best matches" to a given query as opposed to exact matches sought by traditional transactions. This type of exploratory querying (also called top-k querying) can significantly improve the performance of web-based applications such as consumer reviews, price comparisons and recommendations for products/services. Due to the lack of support for specialized indexes and/or data structures in relational database management systems (RDBMSs), recent research has focused on utilizing summary statistics (histograms) maintained by RDBMSs for translating the top-k request into a traditional range query. Because the RDBMS query engines are already optimized for execution of range queries, such approach has both practical as well as efficiency advantages. In this paper, we review the strengths and weaknesses of common histogram construction techniques with regard to their structural characteristics, accuracy in approximating the true distribution of the underlying data, and implications for top-k retrieval. We also present our top-k retrieval strategy (Query-Level Optimal Cost Strategy - QLOCS) and demonstrate its "histogram-independent" performance. Based on comparative experimental and statistical analyses with the best-known histogram-based strategy in the literature, we show that QLOCS is not only more efficient but also provides more consistent performance across commonly used histogram types in RDBMSs. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:326 / 349
页数:24
相关论文
共 50 条
  • [41] Top-k Combinatorial Skyline Queries
    Su, I-Fang
    Chung, Yu-Chi
    Lee, Chiang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT II, PROCEEDINGS, 2010, 5982 : 79 - +
  • [42] Approximate distributed top-k queries
    Boaz Patt-Shamir
    Allon Shafrir
    [J]. Distributed Computing, 2008, 21 : 1 - 22
  • [43] Top-k Dominating Queries: an introduction
    Manolopoulos, Yannis
    [J]. 2015 12th IEEE International Conference on Programming and Systems (ISPS), 2015,
  • [44] Top-k queries on RDF graphs
    Wang, Dong
    Zou, Lei
    Zhao, Dongyan
    [J]. INFORMATION SCIENCES, 2015, 316 : 201 - 217
  • [45] Top-k queries on temporal data
    Feifei Li
    Ke Yi
    Wangchao Le
    [J]. The VLDB Journal, 2010, 19 : 715 - 733
  • [46] Optimizing Distributed Top-k Queries
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 337 - +
  • [47] Top-k spatial preference queries
    Yiu, Man Lung
    Dai, Xiangyuan
    Mamoulis, Nikos
    Vaitis, Michail
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1051 - +
  • [48] Continuous Top-k Dominating Queries
    Kontaki, Maria
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (05) : 840 - 853
  • [49] Top-k Sequenced Route Queries
    Ohsawa, Yutaka
    Htoo, Htoo
    [J]. 2017 18TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (IEEE MDM 2017), 2017, : 320 - 323
  • [50] Top-k queries on temporal data
    Li, Feifei
    Yi, Ke
    Le, Wangchao
    [J]. VLDB JOURNAL, 2010, 19 (05): : 715 - 733