Efficient Evaluation of Continuous Text Search Queries

被引:11
|
作者
Mouratidis, Kyriakos [1 ]
Pang, HweeHwa [1 ]
机构
[1] Singapore Management Univ, Sch Informat Syst, Singapore 178902, Singapore
关键词
Continuous queries; document streams; text filtering; K SELECTION QUERIES; MAINTENANCE; STRATEGIES;
D O I
10.1109/TKDE.2011.125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Consider a text filtering server that monitors a stream of incoming documents for a set of users, who register their interests in the form of continuous text search queries. The task of the server is to constantly maintain for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents in main memory with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. We distinguish between two versions of the monitoring algorithm, an eager and a lazy one, which differ in how aggressively they manage the thresholds on the inverted index. Using benchmark queries over a stream of real documents, we experimentally verify the efficiency of our methodology; both its versions are at least an order of magnitude faster than a competitor constructed from existing techniques, with lazy being the best approach overall.
引用
收藏
页码:1469 / 1482
页数:14
相关论文
共 50 条
  • [31] Geneshot: search engine for ranking genes from arbitrary text queries
    Lachmann, Alexander
    Schilder, Brian M.
    Wojciechowicz, Megan L.
    Torre, Denis
    Kuleshov, Maxim V.
    Keenan, Alexandra B.
    Ma'ayan, Avi
    NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W571 - W577
  • [32] Efficient Geometric Pruning Strategies for Continuous Skyline Queries
    Zheng, Jiping
    Chen, Jialiang
    Wang, Haixiang
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2017, 6 (03):
  • [33] Optimization of bounded continuous search queries based on ranking distributions
    Kukulenz, D.
    Hoeller, N.
    Groppe, S.
    Linnemann, V.
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2007, PROCEEDINGS, 2007, 4831 : 26 - 37
  • [34] SensPrecOptimizer: a software tool that combined search queries to design efficient search strategies
    Mesgarpour, Bita
    Mesgarpour, Mohsen
    Herkner, Harald
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2016, 71 : 122 - 123
  • [35] What financial topics do people search for? An analysis of search queries using text mining
    Jalil, Nursabrina Abdul
    Hamid, Suraya
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [36] Continuous Similarity Search for Dynamic Text Streams
    Tsuchida, Yuma
    Kubo, Kohei
    Koga, Hisashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (12) : 2026 - 2035
  • [37] Efficient Encrypted Data Search With Expressive Queries and Flexible Update
    Ning, Jianting
    Chen, Jiageng
    Liang, Kaitai
    Liu, Joseph K.
    Su, Chunhua
    Wu, Qianhong
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1619 - 1633
  • [38] Efficient techniques for range search queries on earth science data
    Shi, QM
    JaJa, JF
    14TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2002, : 142 - 151
  • [39] EFFICIENT OPTIMIZATION OF LARGE JOIN QUERIES USING TABU SEARCH
    MATYSIAK, M
    INFORMATION SCIENCES, 1995, 83 (1-2) : 77 - 88
  • [40] An efficient mechanism for processing similarity search queries in sensor networks
    Chung, Yu-Chi
    Su, I-Fang
    Lee, Chiang
    INFORMATION SCIENCES, 2011, 181 (02) : 284 - 307