Optimal algorithms for selecting top-k combinations of attributes: theory and applications

被引:6
|
作者
Lin, Chunbin [1 ]
Lu, Jiaheng [2 ]
Wei, Zhewei [3 ]
Wang, Jianguo [1 ]
Xiao, Xiaokui [4 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[3] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
VLDB JOURNAL | 2018年 / 27卷 / 01期
基金
芬兰科学院;
关键词
Top-k query; Top-k m query; Instance optimal algorithm; KEYWORD SEARCH; RELATIONAL DATABASES; QUERIES;
D O I
10.1007/s00778-017-0485-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
引用
收藏
页码:27 / 52
页数:26
相关论文
共 50 条
  • [31] Parallel Top-K Similarity Join Algorithms Using MapReduce
    Kim, Younghoon
    Shim, Kyuseok
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 510 - 521
  • [32] Algorithms for Top-k join queries in wireless sensor networks
    Mo, Shang-Feng
    Chen, Ding-Jie
    Chen, Hong
    Li, Ying-Long
    Li, Cui-Ping
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (03): : 557 - 570
  • [33] Efficient Top-K Query Algorithms Using Density Index
    Chen, Dongqu
    Sun, Guang-Zhong
    Gong, Neil Zhenqiang
    Zhong, Xiaoqiang
    APPLIED INFORMATICS AND COMMUNICATION, PT I, 2011, 224 : 38 - +
  • [34] Efficient Algorithms for Mining Top-K High Utility Itemsets
    Tseng, Vincent S.
    Wu, Cheng-Wei
    Fournier-Viger, Philippe
    Yu, Philip S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 54 - 67
  • [35] Efficient processing of top-k queries: selective NRA algorithms
    Yuan, Jing
    Sun, Guangzhong
    Luo, Tao
    Lian, Defu
    Chen, Guoliang
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (03) : 687 - 710
  • [36] An Experimental Evaluation of Aggregation Algorithms for Processing Top-K Queries
    Zhu, Liang
    Ma, Qin
    Meng, Weiyi
    Yang, Mingqian
    Yuan, Fang
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 326 - 333
  • [37] Efficient Top-K Query Algorithms Using Density Index
    Chen, Dongqu
    Sun, Guang-Zhong
    Gong, Neil Zhenqiang
    Zhong, Xiaoqiang
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL I, 2010, : 33 - +
  • [38] Efficient processing of top-k queries: selective NRA algorithms
    Jing Yuan
    Guangzhong Sun
    Tao Luo
    Defu Lian
    Guoliang Chen
    Journal of Intelligent Information Systems, 2012, 39 : 687 - 710
  • [39] Efficient and Robust Top-k Algorithms for Big Data IoT
    Yang, Ruifan
    Zhou, Zheng
    Tseng, Lewis
    Alogaily, Moayad
    Boukerche, Azzedine
    ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [40] Efficient Top-k Query Algorithms Using K-Skyband Partition
    Gong, Zhenqiang
    Sun, Guang-Zhong
    Yuan, Jing
    Zhong, Yanjing
    SCALABLE INFORMATION SYSTEMS, 2009, 18 : 288 - 305