Approximating Document Frequency for Self-Index based Top-k Document Retrieval

被引:1
|
作者
Suzuki, Tokinori [1 ]
Fujii, Atsushi [1 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, 2-12-1 Ookayama, Tokyo, Japan
关键词
approximate search; FM-index; wavelet tree;
D O I
10.1109/WAINA.2015.68
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store precalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
引用
收藏
页码:541 / 546
页数:6
相关论文
共 50 条
  • [1] Top-k document retrieval in optimal space
    Tsur, Dekel
    [J]. INFORMATION PROCESSING LETTERS, 2013, 113 (12) : 440 - 443
  • [2] Faster Compact Top-k Document Retrieval
    Konow, Roberto
    Navarro, Gonzalo
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 351 - 360
  • [3] TEII: Topic enhanced inverted index for top-k document retrieval
    Jiang, Di
    Leung, Kenneth Wai-Ting
    Yang, Lingxiao
    Ng, Wilfred
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 346 - 358
  • [4] Top-K Color Queries for Document Retrieval
    Karpinski, Marek
    Nekrich, Yakov
    [J]. PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 401 - 411
  • [5] Faster Compressed Top-k Document Retrieval
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 341 - 350
  • [6] Top-k Document Retrieval in External Memory
    Shah, Rahul
    Sheng, Cheng
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. ALGORITHMS - ESA 2013, 2013, 8125 : 803 - 814
  • [7] Efficient In-Memory Top-k Document Retrieval
    Culpepper, J. Shane
    Petri, Matthias
    Scholer, Falk
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 225 - 234
  • [8] TIME-OPTIMAL TOP-k DOCUMENT RETRIEVAL
    Navarro, Gonzalo
    Nekrich, Yakov
    [J]. SIAM JOURNAL ON COMPUTING, 2017, 46 (01) : 80 - 113
  • [9] Faster Top-k Document Retrieval in Optimal Space
    Navarro, Gonzalo
    Thankachan, Sharma V.
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2013), 2013, 8214 : 255 - 262
  • [10] Efficient Top-k Document Retrieval Using a Term-Document Binary Matrix
    Fujita, Etsuro
    Oyama, Keizo
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 293 - 302