Text indexing with errors

被引:0
|
作者
Maass, MG [1 ]
Nowak, J [1 ]
机构
[1] Tech Univ Munich, Fak Informat, D-85748 Garching, Germany
来源
COMBINATORIAL PATTERN MATCHING, PROCEEDINGS | 2005年 / 3537卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size O(n log(k) n) on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [41] Retrieval experiments: Full text versus human indexing versus automatic indexing
    Lancaster, FW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1998, 49 (05): : 484 - 484
  • [42] Impact of indexing errors on spur gear dynamics
    Inalpolat, M.
    Handschuh, M.
    Kahraman, A.
    INTERNATIONAL GEAR CONFERENCE 2014, 2014, : 751 - +
  • [43] Random Indexing and Modified Random Indexing based approach for extractive text summarization
    Chatterjee, Niladri
    Sahoo, Pramod Kumar
    COMPUTER SPEECH AND LANGUAGE, 2015, 29 (01): : 32 - 44
  • [44] A novel full-text indexing model for Chinese text retrieval
    Zhou, SG
    Hu, YF
    Hu, JT
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 2001, 2113 : 370 - 379
  • [45] DyST: Dynamic and scalable temporal text indexing
    Norvag, Kjetil
    Nybo, Albert Overskeid
    TIME 2006: THIRTEENTH INTERNATIONAL SYMPOSIUM ON TEMPORAL REPRESENTATION AND REASONING, PROCEEDINGS, 2006, : 204 - +
  • [46] Indexing text and visual features for WWW images
    Shen, HT
    Zhou, XF
    Cui, B
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 885 - 899
  • [47] Indexing text events in digital video databases
    Gargi, U
    Antani, S
    Kasturi, R
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 916 - 918
  • [48] Efficient indexing for Query By String text retrieval
    Ghosh, Suman K.
    Gomez, Liuis
    Karatzas, Dimosthenis
    Valveny, Ernest
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1236 - 1240
  • [49] The impact of indexing approaches on Arabic text classification
    Al-Badarneh, Amer
    Al-Shawakfa, Emad
    Bani-Ismail, Basel
    Al-Rababah, Khaleel
    Shatnawi, Safwan
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 159 - 173
  • [50] RPCL text clustering based on concept indexing
    Gao, MT
    Wang, ZO
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2331 - 2335