Efficient POSIX submatch extraction on nondeterministic finite automata

被引:3
|
作者
Borsotti, Angelo [1 ]
Trofimovich, Ulya [2 ]
机构
[1] Polytech Univ Milan, Dept Elect Informat & Bioengn, Milan, Italy
[2] Belarusian State Univ, Dept Discrete Math & Algorithm, Minsk, BELARUS
来源
SOFTWARE-PRACTICE & EXPERIENCE | 2021年 / 51卷 / 02期
关键词
finite-state automata; parsing; POSIX; regular expressions; submatch extraction;
D O I
10.1002/spe.2881
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we study the performance of POSIX submatch extraction algorithms based on nondeterministic finite automata (NFA). We propose an algorithm that combines Laurikari tagged NFA and extended Okui-Suzuki disambiguation. The algorithm works in worst-caseO(n m(2) t)time andO(m(2))space (including preprocessing), wherenis the length of input,mis the size of the regular expression with bounded repetition expanded andtis the number of capturing groups and subexpressions that contain them. On real-world benchmarks our algorithm performs close to theO(n m t)complexity of leftmost-greedy matching, although on artificial benchmarks it can be significantly slower. We propose a lazy version of the algorithm that runs much faster, but requiresO(n m(2))space. We show that the Kuklewicz algorithm is slower in practice, and the backward matching algorithm proposed by Cox is incorrect.
引用
收藏
页码:159 / 192
页数:34
相关论文
共 50 条
  • [11] On path equivalence of nondeterministic finite automata
    Tzeng, WG
    INFORMATION PROCESSING LETTERS, 1996, 58 (01) : 43 - 46
  • [12] Efficient Construction of Semilinear Representations of Languages Accepted by Unary Nondeterministic Finite Automata
    Sawa, Zdenek
    FUNDAMENTA INFORMATICAE, 2013, 123 (01) : 97 - 106
  • [13] Efficient Algorithms for Handling Nondeterministic Automata
    Vojnar, Tomas
    SOFSEM 2011: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2011, 6543 : 73 - 73
  • [14] Reducing Nondeterministic Finite Automata with SAT Solvers
    Geldenhuys, Jaco
    van der Merwe, Brink
    van Zijl, Lynette
    FINITE-STATE METHODS AND NATURAL LANGUAGE PROCESSING, 2010, 6062 : 81 - 92
  • [15] Languages recognized by nondeterministic quantum finite automata
    Yakaryilmaz, Abuzer
    Cem Say, A.C.
    Quantum Information and Computation, 2010, 10 (9-10): : 747 - 770
  • [16] Lengths of words accepted by nondeterministic finite automata
    Potechin, Aaron
    Shallit, Jeffrey
    INFORMATION PROCESSING LETTERS, 2020, 162
  • [17] Parallel Induction of Nondeterministic Finite Automata Revisited
    Jastrzab, Tomasz
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING 2017 (ICCMSE-2017), 2017, 1906
  • [18] More on deterministic and nondeterministic finite cover automata
    Gruber, Hermann
    Holzer, Markus
    Jakobi, Sebastian
    THEORETICAL COMPUTER SCIENCE, 2017, 679 : 18 - 30
  • [19] Forward Bisimulations for Nondeterministic Symbolic Finite Automata
    D'Antoni, Loris
    Veanes, Margus
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, TACAS 2017, PT I, 2017, 10205 : 518 - 534
  • [20] Removing bidirectionality from nondeterministic finite automata
    Kapoutsis, C
    MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE 2005, PROCEEDINGS, 2005, 3618 : 544 - 555