Sparse Regular Expression Matching

被引:0
|
作者
Bille, Philip [1 ]
Gortz, Inge Li [1 ]
机构
[1] Tech Univ Denmark, Lyngby, Denmark
关键词
ALGORITHM; COMPLEXITY; DERIVATIVES; FOLLOW;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A regular expression specifies a set of strings formed by single characters combined with concatenation, union, and Kleene star operators. Given a regular expression R and a string Q, the regular expression matching problem is to decide if Q matches any of the strings specified by R. Regular expressions are a fundamental concept in formal languages and regular expression matching is a basic primitive for searching and processing data. A standard textbook solution [Thompson, CACM 1968] constructs and simulates a nondeterministic finite automaton, leading to an O(nm) time algorithm, where n is the length of Q and m is the length of R. Despite considerable research efforts only polylogarithmic improvements of this bound are known. Recently, conditional lower bounds provided evidence for this lack of progress when Backurs and Indyk [FOCS 2016] proved that, assuming the strong exponential time hypothesis (SETH), regular expression matching cannot be solved in O((nm)(1-epsilon)), for any constant epsilon > 0. Hence, the complexity of regular expression matching is essentially settled in terms of n and m. In this paper, we take a new approach and introduce a density parameter, Delta, that captures the amount of nondeterminism in the NFA simulation on Q. The density is at most nm + 1 but can be significantly smaller. Our main result is a new algorithm that solves regular expression matching in O(Delta log log nm/Delta + n + m) time. This essentially replaces nm with Delta in the complexity of regular expression matching. We complement our upper bound by a matching conditional lower bound that proves that we cannot solve regular expression matching in time O(Delta(1-epsilon)) for any constant epsilon > 0 assuming SETH. The key technical contribution in the result is a new linear space representation of the classic position automaton that supports fast state-set transition computation in near-linear time in the size of the input and output state sets. To achieve this we develop several new insights and techniques of independent interest, including new structural properties of the parse trees of regular expression, a decomposition of state-set transitions based on parse trees, and a fast batched predecessor data structure.
引用
收藏
页码:3354 / 3375
页数:22
相关论文
共 50 条
  • [11] Translating Regular Expression Matching into Transducers
    Minamide, Yasuhiko
    Sakuma, Yuto
    Voronkov, Andrei
    12TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2010), 2011, : 107 - 115
  • [12] From regular expression matching to parsing
    Philip Bille
    Inge Li Gørtz
    Acta Informatica, 2022, 59 : 709 - 724
  • [13] An efficient sparse matrix format for accelerating regular expression matching on field-programmable gate arrays
    Jiang, Lei
    Tan, Jianlong
    Tang, Qiu
    SECURITY AND COMMUNICATION NETWORKS, 2015, 8 (01) : 13 - 24
  • [14] Regular expression matching in reconfigurable hardware
    Sourdis, Ioannis
    Vassiliadis, Stamatis
    Bispo, Joao
    Cardoso, Joao M. P.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2008, 51 (01): : 99 - 121
  • [15] Regular Expression matching with memristor TCAMs
    Graves, Catherine E.
    Ma, Wen
    Sheng, Xia
    Buchanan, Brent
    Zheng, Le
    Lam, Si-Ty
    Li, Xuema
    Chalamalasetti, Sai Rahul
    Kiyama, Lennie
    Foltin, Martin
    Hardy, Matthew P.
    Strachan, John Paul
    2018 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2018, : 242 - 252
  • [16] Regular expression pattern matching for XML
    Hosoya, H
    Pierce, B
    ACM SIGPLAN NOTICES, 2001, 36 (03) : 67 - 80
  • [17] Practical private regular expression matching
    Kerschbaum, Florian
    Security and Privacy in Dynamic Environments, 2006, 201 : 461 - 470
  • [18] Regular expression matching and operational semantics
    Rathnayake, Asiri
    Thielecke, Hayo
    Electronic Proceedings in Theoretical Computer Science, EPTCS, 2011, 62 : 31 - 45
  • [19] Regular Expression Matching and Operational Semantics
    Rathnayake, Asiri
    Thielecke, Hayo
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2011, (62): : 31 - 45
  • [20] Text Indexing for Regular Expression Matching
    Gibney, Daniel
    Thankachan, Sharma, V
    ALGORITHMS, 2021, 14 (05)