Pattern Matching on Grammar-Compressed Strings in Linear Time

被引:0
|
作者
Ganardi, Moses [1 ]
Gawrychowskit, Pawel [2 ]
机构
[1] Max Planck Inst Software Syst MPI SWS, Saarbrucken, Germany
[2] Univ Wroclaw, Inst Comp Sci, Wroclaw, Poland
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern p of length m and a text t of length n, does p occur in t? Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory. However, the rapid increase in the amount of generated and stored data brings the need of designing algorithms that operate directly on compressed representations of data. In the compressed pattern matching problem we are given a compressed representation of the text, with n being the length of the compressed representation and N being the length of the text, and an uncompressed pattern of length m. The most challenging (and yet relevant when working with highly repetitive data, say biological information) scenario is when the chosen compression method is capable of describing a string of exponential length (in the size of its representation). An elegant formalism for such a compression method is that of straight-line programs, which are simply context-free grammars describing exactly one string. While it has been known that compressed pattern matching problem can be solved in O(m+n logN) time for this compression method, designing a linear-time algorithm remained open. We resolve this open question by presenting an O(n + m) time algorithm that, given a context-free grammar of size n that produces a single string t and a pattern p of length m, decides whether p occurs in t as a substring. To this end, we devise improved solutions for the weighted ancestor problem and the substring concatenation problem.
引用
收藏
页码:2833 / 2846
页数:14
相关论文
共 50 条
  • [1] Bookmarks in Grammar-Compressed Strings
    Cording, Patrick Hagge
    Gawrychowski, Pawel
    Weimann, Oren
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2016, 2016, 9954 : 153 - 159
  • [2] Algorithms on Grammar-Compressed Strings
    Landau, Gad M.
    [J]. COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 1 - 1
  • [3] Finger Search in Grammar-Compressed Strings
    Bille, Philip
    Christiansen, Anders Roy
    Cording, Patrick Hagge
    Li Gortz, Inge
    [J]. THEORY OF COMPUTING SYSTEMS, 2018, 62 (08) : 1715 - 1735
  • [4] Detecting regularities on grammar-compressed strings
    Tomohiro, I
    Matsubara, Wataru
    Shimohira, Kouji
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    Narisawa, Kazuyuki
    Shinohara, Ayumi
    [J]. INFORMATION AND COMPUTATION, 2015, 240 : 74 - 89
  • [5] Random Access to Grammar-Compressed Strings
    Bille, Philip
    Landau, Gad M.
    Raman, Rajeev
    Sadakane, Kunihiko
    Satti, Srinivasa Rao
    Weimann, Oren
    [J]. PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 373 - 389
  • [6] Detecting Regularities on Grammar-Compressed Strings
    Tomohiro, I
    Matsubara, Wataru
    Shimohira, Kouji
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    Narisawa, Kazuyuki
    Shinohara, Ayumi
    [J]. MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2013, 2013, 8087 : 571 - 582
  • [7] Finger Search in Grammar-Compressed Strings
    Philip Bille
    Anders Roy Christiansen
    Patrick Hagge Cording
    Inge Li Gørtz
    [J]. Theory of Computing Systems, 2018, 62 : 1715 - 1735
  • [8] Access, Rank, and Select in Grammar-compressed Strings
    Belazzougui, Djamal
    Cording, Patrick Hagge
    Puglisi, Simon J.
    Tabei, Yasuo
    [J]. ALGORITHMS - ESA 2015, 2015, 9294 : 142 - 154
  • [9] RANDOM ACCESS TO GRAMMAR-COMPRESSED STRINGS AND TREES
    Bille, Philip
    Landau, Gad M.
    Raman, Rajeev
    Sadakane, Kunihiko
    Satti, Srinivasa Rao
    Weimann, Oren
    [J]. SIAM JOURNAL ON COMPUTING, 2015, 44 (03) : 513 - 539
  • [10] Pattern search in grammar-compressed graphs
    Boettcher, Stefan
    Hartel, Rita
    Peeters, Sven
    [J]. 2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 361 - 361