Evaluating continuous top-k queries over document streams

被引:0
|
作者
Weixiong Rao
Lei Chen
Shudong Chen
Sasu Tarkoma
机构
[1] Hong Kong University of Science and Technology,Computer Science & Engineering Department
[2] Academy of Sciences,Institute of Microelectronics of Chinese
[3] China R&D Center for Internet of Things,Department of Computer Science
[4] University of Helsinki,undefined
来源
World Wide Web | 2014年 / 17卷
关键词
top-k query; information filtering; web document streams;
D O I
暂无
中图分类号
学科分类号
摘要
At the age of Web 2.0, Web content becomes live, and users would like to automatically receive content of interest. Popular RSS subscription approach cannot offer fine-grained filtering approach. In this paper, we propose a personalized subscription approach over the live Web content. The document is represented by pairs of terms and weights. Meanwhile, each user defines a top-k continuous query. Based on an aggregation function to measure the relevance between a document and a query, the user continuously receives the top-k most relevant documents inside a sliding window. The challenge of the above subscription approach is the high processing cost, especially when the number of queries is very large. Our basic idea is to share evaluation results among queries. Based on the defined covering relationship of queries, we identify the relations of aggregation scores of such queries and develop a graph indexing structure (GIS) to maintain the queries. Next, based on the GIS, we propose a document evaluation algorithm to share query results among queries. After that, we re-use evaluation history documents, and design a document indexing structure (DIS) to maintain the history documents. Finally, we adopt a cost model-based approach to unify the approaches of using GIS and DIS. The experimental results show that our solution outperforms the previous works using the classic inverted list structure.
引用
收藏
页码:59 / 83
页数:24
相关论文
共 50 条
  • [1] Evaluating continuous top-k queries over document streams
    Rao, Weixiong
    Chen, Lei
    Chen, Shudong
    Tarkoma, Sasu
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (01): : 59 - 83
  • [2] Continuous Top-k Monitoring on Document Streams
    Hou, Leong U.
    Zhang, Junjie
    Mouratidis, Kyriakos
    Li, Ye
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (05) : 991 - 1003
  • [3] Continuous Monitoring of Top-k Dominating Queries over Uncertain Data Streams
    Li, Guohui
    Luo, Changyin
    Li, Jianjun
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 244 - 255
  • [4] Evaluating TOP-K Queries Over Business Processes
    Deutch, Daniel
    Milo, Tova
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1195 - 1198
  • [5] Upsortable: Programming Top-K Queries Over Data Streams
    Subercaze, Julien
    Gravier, Christophe
    Gillani, Syed
    Kammoun, Abderrahmen
    Laforest, Frederique
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1873 - 1876
  • [6] Continuous Top-k Monitoring on Document Streams (Extended Abstract)
    Hou, Leong U.
    Zhang, Junjie
    Mouratidis, Kyriakos
    Li, Ye
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1803 - 1804
  • [7] Evaluating top-k selection queries
    Chaudhuri, S
    Gravano, L
    [J]. PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 399 - 410
  • [8] Evaluating Top-k Skyline queries over relational databases
    Brando, Carmen
    Goncalves, Marlene
    Gonzalez, Vanessa
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 254 - +
  • [9] Continuous Top-k Dominating Queries
    Kontaki, Maria
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (05) : 840 - 853
  • [10] Top-K Color Queries for Document Retrieval
    Karpinski, Marek
    Nekrich, Yakov
    [J]. PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 401 - 411