Optimal algorithms for selecting top-k combinations of attributes: theory and applications

被引:6
|
作者
Lin, Chunbin [1 ]
Lu, Jiaheng [2 ]
Wei, Zhewei [3 ]
Wang, Jianguo [1 ]
Xiao, Xiaokui [4 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[3] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
VLDB JOURNAL | 2018年 / 27卷 / 01期
基金
芬兰科学院;
关键词
Top-k query; Top-k m query; Instance optimal algorithm; KEYWORD SEARCH; RELATIONAL DATABASES; QUERIES;
D O I
10.1007/s00778-017-0485-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
引用
收藏
页码:27 / 52
页数:26
相关论文
共 50 条
  • [1] Optimal algorithms for selecting top-k combinations of attributes: theory and applications
    Chunbin Lin
    Jiaheng Lu
    Zhewei Wei
    Jianguo Wang
    Xiaokui Xiao
    [J]. The VLDB Journal, 2018, 27 : 27 - 52
  • [2] Top-k Algorithms and Applications
    Das, Gautam
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 789 - 792
  • [3] Optimal Join Algorithms Meet Top-k
    Tziavelis, Nikolaos
    Gatterbauer, Wolfgang
    Riedewald, Mirek
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2659 - 2665
  • [4] Parameterized top-K algorithms
    Chen, Jianer
    Kanj, Iyad A.
    Meng, Jie
    Xia, Ge
    Zhang, Fenghui
    [J]. THEORETICAL COMPUTER SCIENCE, 2013, 470 : 105 - 119
  • [5] DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization
    Xue, Zhengyuan
    Li, Ruixuan
    Zhang, Heng
    Gu, Xiwu
    Xu, Zhiyong
    [J]. PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 370 - 379
  • [6] Crowdsourced Top-k Algorithms: An Experimental Evaluation
    Zhang, Xiaohang
    Li, Guoliang
    Feng, Jianhua
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (08): : 612 - 623
  • [7] Top-k Similarity Matching in Large Graphs with Attributes
    Ding, Xiaofeng
    Jia, Jianhong
    Li, Jiuyong
    Liu, Jixue
    Jin, Hai
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 156 - 170
  • [8] Finding Top-k Optimal Sequenced Routes
    Liu, Huiping
    Jin, Cheqing
    Yang, Bin
    Zhou, Aoying
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 569 - 580
  • [9] Top-k document retrieval in optimal space
    Tsur, Dekel
    [J]. INFORMATION PROCESSING LETTERS, 2013, 113 (12) : 440 - 443
  • [10] A Fuzzy Framework For Selecting Top-k Web Service Compositions
    Benouaret, Karim
    Benslimane, Djamal
    Hadjali, Allel
    [J]. APPLIED COMPUTING REVIEW, 2011, 11 (03): : 32 - 40