Time/space efficient compressed pattern matching

被引:0
|
作者
Gasieniec, L [1 ]
Potapov, I [1 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Liverpool L69 7ZF, Merseyside, England
关键词
compressed pattern matching; straight-line program; directed acyclic graph traversal; small extra space;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An exact pattern matching problem is to find all occurrences of a pattern p in a text t. We say that the pattern matching algorithm is optimal if its running time is linear in the sizes of t, and p, i.e., O(t + p). Perhaps one of the most interesting settings of the pattern matching problem is when one has to design an efficient algorithm with a help of a small extra space. In this paper we explore this setting to the extreme. We work under an assumption that the text t is available only in a compressed form, represented by a straight-line program. The compression methods based on efficient construction of straight-line programs are as competitive as the compression standards, including the Lempel-Ziv compression scheme and recently intensively studied text compression via block sorting, due to Burrows and Wheeler. Our main result is an algorithm that solves the compressed string matching problem in an optimal linear time, with a help of a constant extra space. We also discuss an efficient implementation of a version our algorithm showing that the new concept may have also some interesting real applications. Our result is in contrast with many other compressed pattern matching algorithms where the goal is to find all pattern occurrences in time related to the size of the compressed text. However one must remember that all previous algorithms used at least a linear (in a compressed text, a dictionary, or a pattern) extra memory while our algorithm can be implemented in a constant size extra space. Also from the practical point of view, when the compression ratio is constant (very rarely smaller than 25%), there is no dramatic difference between the running time based on the size of the compressed text and the size of the original text, while an extra space resources might be strictly limited.
引用
收藏
页码:137 / 154
页数:18
相关论文
共 50 条
  • [1] Improving time and space complexity for compressed pattern matching
    Maruyama, Shirou
    Miyagawa, Hiromitsu
    Sakamoto, Hiroshi
    [J]. ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2006, 4288 : 484 - +
  • [2] A RUN-TIME EFFICIENT IMPLEMENTATION OF COMPRESSED PATTERN MATCHING AUTOMATA
    Matsumoto, Tetsuya
    Hagio, Kazuhito
    Takeda, Masayuki
    [J]. INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2009, 20 (04) : 717 - 733
  • [3] A run-time efficient implementation of compressed pattern matching automata
    Matsumoto, Tetsuya
    Hagio, Kazuhito
    Takeda, Masayuki
    [J]. IMPLEMENTATION AND APPLICATION OF AUTOMATA, PROCEEDINGS, 2008, 5148 : 201 - 211
  • [4] An efficient pattern matching scheme in LZW compressed sequences
    Lee, Tsern-Huei
    Huang, Nai-Lun
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2008, 1 (04) : 325 - 335
  • [5] Simple and efficient LZW-compressed multiple pattern matching
    Gawrychowski, Pawel
    [J]. JOURNAL OF DISCRETE ALGORITHMS, 2014, 25 : 34 - 41
  • [6] Efficient Parameterized Pattern Matching in Sublinear Space
    Ideguchi, Haruki
    Hendrian, Diptarama
    Yoshinaka, Ryo
    Shinohara, Ayumi
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2023, 2023, 14240 : 271 - 283
  • [7] Pattern Matching on Grammar-Compressed Strings in Linear Time
    Ganardi, Moses
    Gawrychowskit, Pawel
    [J]. PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 2833 - 2846
  • [8] Compressed pattern matching for SEQUITUR
    Mitarai, S
    Hirao, M
    Matsumoto, T
    Shinohara, A
    Takeda, M
    Arikawa, S
    [J]. DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 469 - 478
  • [9] Compressed Parameterized Pattern Matching
    Beal, Richard
    Adjeroh, Donald A.
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 461 - 470
  • [10] Compressed Consecutive Pattern Matching
    Gawrychowski, Pawel
    Gourdel, Garance
    Starikovskaya, Tatiana
    Steiner, Teresa Anna
    [J]. 2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 163 - 172