The suffix-signature method for searching for phrases in text

被引:2
|
作者
Zhou, M
Tompa, FW
机构
[1] Open Text Corp, Waterloo, ON N2L 5Z5, Canada
[2] Univ Waterloo, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
text indexing; phrase search; suffix arrays; PAT arrays; signature arrays;
D O I
10.1016/S0306-4379(98)00029-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new algorithm to find all occurrences of a given phrase based on the data structure known as a suffix array and using a corresponding array of signatures. With this algorithm, matches to phrases of moderate length can be found with expected search time of one disk access to the text and one disk access to its index. To achieve this performance for phrases of up to five words in length requires an index having total size of approximately 120% of the size of the text. The algorithm guarantees a worst case search performance of two disk accesses to the text per phrase search. Experiments with actual data ranging in size from 6Mb to 550Mb and with actual query patterns derived from logs of searches on the World Wide Web show that the approach is applicable in practice to a variety of texts and realistic phrase searches. (C)1998 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:567 / 588
页数:22
相关论文
共 50 条
  • [1] ACCELERATING TEXT SEARCHING THROUGH SIGNATURE TREES
    KOTAMARTI, U
    THARP, AL
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1990, 41 (02): : 79 - 86
  • [2] INVERTED SIGNATURE TREES AND TEXT SEARCHING ON CD-ROMS
    COOPER, LKD
    THARP, AL
    INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (02) : 161 - 169
  • [3] Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space
    Gagie, Travis
    Navarro, Gonzalo
    Prezza, Nicola
    JOURNAL OF THE ACM, 2020, 67 (01)
  • [4] The phrases without text
    Myers, Lindsy L.
    FRENCH REVIEW, 2014, 87 (03): : 230 - 231
  • [5] Phrases in Text Types
    Dziurewicz, Elzbieta
    Wozniak, Joanna
    Zenderowska-Korpus, Grazyna
    MODERNA SPRAK, 2023, 117 (01): : 157 - 162
  • [6] Fast string searching with suffix trees
    Nelson, MR
    DR DOBBS JOURNAL, 1996, 21 (08): : 115 - 119
  • [7] Suffix Trays and Suffix Trists: Structures for Faster Text Indexing
    Richard Cole
    Tsvi Kopelowitz
    Moshe Lewenstein
    Algorithmica, 2015, 72 : 450 - 466
  • [8] Suffix Trays and Suffix Trists: Structures for Faster Text Indexing
    Cole, Richard
    Kopelowitz, Tsvi
    Lewenstein, Moshe
    ALGORITHMICA, 2015, 72 (02) : 450 - 466
  • [9] METHOD OF ANNOTATED SUFFIX TREE FOR SCORING THE EXTENT OF PRESENCE OF A STRING IN TEXT
    Mirkin, B.
    Chernyak, E.
    Chugunova, O.
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2012, 21 (03): : 31 - +
  • [10] Suffix trays and suffix trists: Structures for faster text indexing
    Cole, Richard
    Kopelowitz, Tsvi
    Lewenstein, Moshe
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT 1, 2006, 4051 : 358 - 369