Efficient Evaluation of Continuous Text Search Queries

被引:11
|
作者
Mouratidis, Kyriakos [1 ]
Pang, HweeHwa [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore 178902, Singapore
关键词
Continuous queries; document streams; text filtering; K SELECTION QUERIES; MAINTENANCE; STRATEGIES;
D O I
10.1109/TKDE.2011.125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
引用
收藏
页码:1469 / 1482
页数:14
相关论文
共 50 条
  • [41] Efficient evaluation of HAVING queries on a probabilistic database
    Re, Christopher
    Suciu, Dan
    DATABASE PROGRAMMING LANGUAGES, 2007, 4797 : 186 - +
  • [42] Efficient in-network evaluation of multiple queries
    Pandit, Vinayaka
    Ji, Hui-bo
    High Performance Computing - HiPC 2006, Proceedings, 2006, 4297 : 205 - 216
  • [43] EFFICIENT EVALUATION OF ARBITRARY RELATIONAL CALCULUS QUERIES
    Raszyk, Martin
    Basin, David
    Krstic, Srdan
    Traytel, Dmitriy
    LOGICAL METHODS IN COMPUTER SCIENCE, 2023, 19 (04)
  • [44] Efficient evaluation of sibling relationship in XPath queries
    Wan, CX
    Liu, XP
    Lin, DH
    ADVANCES IN COMPUTER SCIENCE - ASIAN 2005, PROCEEDINGS: DATA MANAGEMENT ON THE WEB, 2005, 3818 : 193 - 207
  • [45] Efficient evaluation of XML path queries with automata
    Sun, B
    Lv, JH
    Wang, GR
    Yu, G
    Zhou, B
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 116 - 127
  • [46] Efficient evaluation of specific queries in constraint databases
    Grimson, Rafael
    Heintz, Joos
    Kuijpers, Bart
    INFORMATION PROCESSING LETTERS, 2011, 111 (19) : 941 - 944
  • [47] A motion-aware approach for efficient evaluation of continuous queries on 3D object databases
    Ali, Mohammed Eunus
    Tanin, Egemen
    Zhang, Rui
    Kulik, Lars
    VLDB JOURNAL, 2010, 19 (05): : 603 - 632
  • [48] A motion-aware approach for efficient evaluation of continuous queries on 3D object databases
    Mohammed Eunus Ali
    Egemen Tanin
    Rui Zhang
    Lars Kulik
    The VLDB Journal, 2010, 19 : 603 - 632
  • [49] Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries
    Jonassen, Simon
    Bratsberg, Svein Erik
    ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 530 - 542
  • [50] Efficient Fuzzy Search in Large Text Collections
    Bast, Hannah
    Celikik, Marjan
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2013, 31 (02)