Document Retrieval on Repetitive Collections

被引:0
|
作者
Navarro, Gonzalo [1 ]
Puglisi, Simon J. [2 ]
Siren, Jouni [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Ctr Biotechnol & Bioengn, Santiago, Chile
[2] Univ Helsinki, Helsinki, Finland
来源
ALGORITHMS - ESA 2014 | 2014年 / 8737卷
基金
芬兰科学院;
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on tailored indexes that offer near-optimal performance. However, an experimental study establishing which alternatives are actually better than brute force, and which perform best depending on the collection characteristics, has not been carried out. In this paper we address this shortcoming by exploring the relationship between the nature of the underlying collection and the performance of current methods. Via extensive experiments we show that established solutions are often beaten in practice by brute-force alternatives. We also design new methods that offer superior time/space tradeoffs, particularly on repetitive collections.
引用
收藏
页码:725 / 736
页数:12
相关论文
共 50 条
  • [31] Flexible Indexing of Repetitive Collections
    Belazzougui, Djamal
    Cunial, Fabio
    Gagie, Travis
    Prezza, Nicola
    Raffinot, Mathieu
    UNVEILING DYNAMICS AND COMPLEXITY, CIE 2017, 2017, 10307 : 162 - 174
  • [32] Improving knowledge discovery in document collections through combining text retrieval and link analysis techniques
    Jin, Wei
    Srihari, Rohini K.
    Ho, Hung Hay
    Wu, Xin
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 193 - +
  • [33] Symbol spotting in digital libraries: Focused retrieval over graphic-rich document collections
    Departament de Ciències de la Computació, Centre de Visió per Computador, Campus UAB, 08193 Bellaterra, Spain
    Symb. Spotting in Digit. Libraries: Focused Retr. over Graphic-rich Docum. Collec., (1-180):
  • [34] DOCUMENT RETRIEVAL
    BERUL, LH
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1969, 4 : 203 - 227
  • [35] Searching Corrupted Document Collections
    Soo, Jason
    Frieder, Ophir
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 440 - 445
  • [36] Semantic Wordification of Document Collections
    Paulovich, Fernando V.
    Toledo, Franklina M. B.
    Telles, Guilherme P.
    Minghim, Rosane
    Nonato, Luis Gustavo
    COMPUTER GRAPHICS FORUM, 2012, 31 (03) : 1145 - 1153
  • [37] DOCUMENT COLLECTIONS OF THE LIBRARY OF CONGRESS
    Falkner, Roland P.
    LIBRARY JOURNAL, 1901, 26 (12) : 870 - 871
  • [38] Metrics for XML document collections
    Klettke, M
    Schneider, L
    Heuer, A
    XML-BASED DATA MANAGEMENT AND MULTIMEDIA ENGINEERING-EDBT 2002 WORKSHOPS, 2002, 2490 : 15 - 28
  • [39] RETRIEVAL SPACE AND DOCUMENT-RETRIEVAL
    VOISKUNSKII, VG
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1980, (09): : 17 - 22
  • [40] Retrieval on Parametric Shape Collections
    Schulz, Adriana
    Shamir, Ariel
    Baran, Ilya
    Levin, David I. W.
    Sitthi-Amorn, Pitchaya
    Matusik, Wojciech
    ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (01):