Efficient Online String Matching Based on Characters Distance Text Sampling

被引:1
|
作者
Faro, Simone [1 ]
Marino, Francesco Pio [1 ]
Pavone, Arianna [2 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, Viale A Doria 6, I-95125 Catania, Italy
[2] Univ Messina, Dipartimento Sci Cognit, Via Concez 6, I-98122 Messina, Italy
关键词
String matching; Text processing; Efficient searching; Text indexing;
D O I
10.1007/s00453-020-00732-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology.Sampled string matchingis an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a givenpivotcharacter and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.
引用
收藏
页码:3390 / 3412
页数:23
相关论文
共 50 条
  • [31] A simple and efficient text matching model based on deep interaction
    Yu, Chuanming
    Xue, Haodong
    Jiang, Yifan
    An, Lu
    Li, Gang
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (06)
  • [32] Hashing based Efficient Inference for Image-Text Matching
    Tu, Rong-Cheng
    Ji, Lei
    Luo, Huaishao
    Shi, Botian
    Huang, Heyan
    Duan, Nan
    Mao, Xian-Ling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 743 - 752
  • [33] HYBRID CONTEXTUAL TEXT RECOGNITION WITH STRING-MATCHING
    SINHA, RMK
    PRASADA, B
    HOULE, GF
    SABOURIN, M
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (09) : 915 - 925
  • [34] Linear and efficient string matching algorithms based on weak factor recognition
    Cantone D.
    Faro S.
    Pavone A.
    ACM Journal of Experimental Algorithmics, 2019, 24 (01):
  • [35] Speeding Up Pattern Matching by Text Sampling
    Claude, Francisco
    Navarro, Gonzalo
    Peltola, Hannu
    Salmela, Leena
    Tarhio, Jorma
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2008, 5280 : 87 - +
  • [36] String edit distance, random walks and graph matching
    Robles-Kelly, A
    Hancock, ER
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2004, 18 (03) : 315 - 327
  • [37] Robust and Fast Phonetic String Matching Method for Lyric Searching Based on Acoustic Distance
    Xu, Xin
    Kato, Tsuneo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2501 - 2509
  • [38] Efficient Approximate Substring Matching in Compressed String
    Han, Yutong
    Wang, Bin
    Yang, Xiaochun
    Web-Age Information Management, Pt II, 2016, 9659 : 184 - 197
  • [39] Efficient string matching with wildcards and length constraints
    Chen, Gong
    Wu, Xindong
    Zhu, Xingquan
    Arslan, Abdullah N.
    He, Yu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (04) : 399 - 419
  • [40] Efficient algorithms for approximate string matching with swaps
    Lee, JS
    Kim, DK
    Park, K
    Cho, Y
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 1997, 1264 : 28 - 39