Efficient Evaluation of Continuous Text Search Queries

被引:11
|
作者
Mouratidis, Kyriakos [1 ]
Pang, HweeHwa [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore 178902, Singapore
关键词
Continuous queries; document streams; text filtering; K SELECTION QUERIES; MAINTENANCE; STRATEGIES;
D O I
10.1109/TKDE.2011.125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
引用
收藏
页码:1469 / 1482
页数:14
相关论文
共 50 条
  • [1] An Incremental Threshold Method for Continuous Text Search Queries
    Mouratidis, Kyriakos
    Pang, HweeHwa
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1187 - 1190
  • [2] Continuous similarity search for evolving queries
    Xiaoning Xu
    Chuancong Gao
    Jian Pei
    Ke Wang
    Abdullah Al-Barakati
    Knowledge and Information Systems, 2016, 48 : 649 - 678
  • [3] Continuous similarity search for evolving queries
    Xu, Xiaoning
    Gao, Chuancong
    Pei, Jian
    Wang, Ke
    Al-Barakati, Abdullah
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 48 (03) : 649 - 678
  • [4] Evolving Lucene Search Queries for Text Classification
    Hirsch, Laurence
    Hirsch, Robin
    Saeedi, Masoud
    GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 1604 - +
  • [5] Efficient Computation of Search Computing Queries
    Braga, Daniele
    Grossniklaus, Michael
    Corcoglioniti, Francesco
    Vadacca, Salvatore
    SEARCH COMPUTING: TRENDS AND DEVELOPMENTS, 2011, 6585 : 141 - 155
  • [6] An efficient approach for continuous density queries
    Jie Wen
    Xiaofeng Meng
    Xing Hao
    Jianliang Xu
    Frontiers of Computer Science, 2012, 6 : 581 - 595
  • [7] Efficient Maintenance of Continuous Queries for Trajectories
    Hui Ding
    Goce Trajcevski
    Peter Scheuermann
    GeoInformatica, 2008, 12 : 255 - 288
  • [8] An efficient approach for continuous density queries
    Wen, Jie
    Meng, Xiaofeng
    Hao, Xing
    Xu, Jianliang
    FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (05) : 581 - 595
  • [9] Efficient maintenance of continuous queries for trajectories
    Ding, Hui
    Trajcevski, Goce
    Scheuermann, Peter
    GEOINFORMATICA, 2008, 12 (03) : 255 - 288
  • [10] Embellishing Text Search Queries To Protect User Privacy
    Pang, HweeHwa
    Ding, Xuhua
    Xiao, Xiaokui
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 598 - 607