Efficient Online String Matching Based on Characters Distance Text Sampling

被引:1
|
作者
Faro, Simone [1 ]
Marino, Francesco Pio [1 ]
Pavone, Arianna [2 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, Viale A Doria 6, I-95125 Catania, Italy
[2] Univ Messina, Dipartimento Sci Cognit, Via Concez 6, I-98122 Messina, Italy
关键词
String matching; Text processing; Efficient searching; Text indexing;
D O I
10.1007/s00453-020-00732-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology.Sampled string matchingis an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a givenpivotcharacter and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.
引用
收藏
页码:3390 / 3412
页数:23
相关论文
共 50 条
  • [1] Efficient Online String Matching Based on Characters Distance Text Sampling
    Simone Faro
    Francesco Pio Marino
    Arianna Pavone
    Algorithmica, 2020, 82 : 3390 - 3412
  • [2] Improved characters distance sampling for online and offline text searching
    Faro, Simone
    Marino, Francesco Pio
    Pavone, Arianna
    THEORETICAL COMPUTER SCIENCE, 2023, 946
  • [3] Online Pattern Matching for String Edit Distance with Moves
    Takabatake, Yoshimasa
    Tabei, Yasuo
    Sakamoto, Hiroshi
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 203 - 214
  • [4] A Fast String Matching Algorithm Based on Lowlight Characters in the Pattern
    Cao, Zhengjun
    Yan, Zhenzhen
    Liu, Lihua
    2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2015, : 179 - 182
  • [5] String matching with alphabet sampling
    Claude, Francisco
    Navarro, Gonzalo
    Peltola, Hannu
    Salmela, Leena
    Tarhio, Jorma
    JOURNAL OF DISCRETE ALGORITHMS, 2012, 11 : 37 - 50
  • [6] Online public opinion hotspot detection and analysis based on short text clustering using string distance
    Yang, Zhen
    Duan, Li-Juan
    Lai, Ying-Xu
    Beijing Gongye Daxue Xuebao/Journal of Beijing University of Technology, 2010, 36 (05): : 669 - 673
  • [7] Online signature verification based on string edit distance
    Kaspar Riesen
    Roman Schmidt
    International Journal on Document Analysis and Recognition (IJDAR), 2019, 22 : 41 - 54
  • [8] Online signature verification based on string edit distance
    Riesen, Kaspar
    Schmidt, Roman
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2019, 22 (01) : 41 - 54
  • [9] STRING MATCHING WITH PREPROCESSING OF TEXT AND PATTERN
    NAOR, M
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 510 : 739 - 750
  • [10] String Matching Using a Distance Function
    Bisht, Raj Kishor
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2015, 22 (02) : 87 - 100