Sparse Regular Expression Matching

被引:0
|
作者
Bille, Philip [1 ]
Gortz, Inge Li [1 ]
机构
[1] Tech Univ Denmark, Lyngby, Denmark
关键词
ALGORITHM; COMPLEXITY; DERIVATIVES; FOLLOW;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A regular expression specifies a set of strings formed by single characters combined with concatenation, union, and Kleene star operators. Given a regular expression R and a string Q, the regular expression matching problem is to decide if Q matches any of the strings specified by R. Regular expressions are a fundamental concept in formal languages and regular expression matching is a basic primitive for searching and processing data. A standard textbook solution [Thompson, CACM 1968] constructs and simulates a nondeterministic finite automaton, leading to an O(nm) time algorithm, where n is the length of Q and m is the length of R. Despite considerable research efforts only polylogarithmic improvements of this bound are known. Recently, conditional lower bounds provided evidence for this lack of progress when Backurs and Indyk [FOCS 2016] proved that, assuming the strong exponential time hypothesis (SETH), regular expression matching cannot be solved in O((nm)(1-epsilon)), for any constant epsilon > 0. Hence, the complexity of regular expression matching is essentially settled in terms of n and m. In this paper, we take a new approach and introduce a density parameter, Delta, that captures the amount of nondeterminism in the NFA simulation on Q. The density is at most nm + 1 but can be significantly smaller. Our main result is a new algorithm that solves regular expression matching in O(Delta log log nm/Delta + n + m) time. This essentially replaces nm with Delta in the complexity of regular expression matching. We complement our upper bound by a matching conditional lower bound that proves that we cannot solve regular expression matching in time O(Delta(1-epsilon)) for any constant epsilon > 0 assuming SETH. The key technical contribution in the result is a new linear space representation of the classic position automaton that supports fast state-set transition computation in near-linear time in the size of the input and output state sets. To achieve this we develop several new insights and techniques of independent interest, including new structural properties of the parse trees of regular expression, a decomposition of state-set transitions based on parse trees, and a fast batched predecessor data structure.
引用
收藏
页码:3354 / 3375
页数:22
相关论文
共 50 条
  • [41] Efficient Regular Expression Matching on Compressed Strings
    Han, Yutong
    Wang, Bin
    Yang, Xiaochun
    Zhu, Huaijie
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 219 - 234
  • [42] Instance based Matching using Regular Expression
    Mehdi, Osama A.
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    ANT 2012 AND MOBIWIS 2012, 2012, 10 : 688 - 695
  • [43] High Performance Regular Expression Matching on FPGA
    Yang, Jiajia
    Jiang, Lei
    Bai, Xu
    Dai, Qiong
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2017, 2018, 252 : 541 - 553
  • [44] NFA Based Regular Expression Matching on FPGA
    Sert, Kamil
    Bazlamacci, Cuneyt F.
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION, AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2021), 2021, : 144 - 148
  • [45] Bitwise Data Parallelism in Regular Expression Matching
    Cameron, Robert D.
    Shermer, Thomas C.
    Shriraman, Arrvindh
    Herdy, Kenneth S.
    Lin, Dan
    Hull, Benjamin R.
    Lin, Meng
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 139 - 150
  • [46] ReCPU: a parallel and pipelined architecture for regular expression matching
    Paolieri, Marco
    Bonesana, Ivano
    Santambrogio, Marco D.
    VLSI-SOC 2007: PROCEEDINGS OF THE 2007 IFIP WG 10.5 INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, 2007, : 19 - +
  • [47] Prefix-free regular-expression matching
    Han, YS
    Wang, YJ
    Wood, D
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2005, 3537 : 298 - 309
  • [48] Differential Encoding of DFAs for Fast Regular Expression Matching
    Ficara, Domenico
    Di Pietro, Andrea
    Giordano, Stefano
    Procissi, Gregorio
    Vitucci, Fabio
    Antichi, Gianni
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2011, 19 (03) : 683 - 694
  • [49] A Method of Regular Expression Matching for Library Server Systems
    Wang, Peifeng
    2015 ACSS INTERNATIONAL CONFERENCE ON THE SOCIAL SCIENCES AND TEACHING RESEARCH (ACSS-SSTR 2015), 2015, 14 : 160 - 165
  • [50] Formalising and implementing Boost POSIX regular expression matching
    Berglund, Martin
    Bester, Willem
    van der Merwe, Brink
    THEORETICAL COMPUTER SCIENCE, 2021, 857 : 147 - 165