Efficient Regular Expression Matching on Compressed Strings

被引:0
|
作者
Han, Yutong [1 ]
Wang, Bin [1 ]
Yang, Xiaochun [1 ]
Zhu, Huaijie [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
关键词
Regular expression; LZ77; String matching; Self-index; SEARCH;
D O I
10.1007/978-3-319-55699-4_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 50 条
  • [31] CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching
    Parravicini, Daniele
    Conficconi, Davide
    Del Sozzo, Emanuele
    Pilato, Christian
    Santambrogio, Marco D.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
  • [32] Efficient Mapping of Nondeterministic Automata to FPGA for Fast Regular Expression Matching
    Korenek, Jan
    Kosar, Vlastimil
    PROCEEDINGS OF THE 13TH IEEE SYMPOSIUM ON DESIGN AND DIAGNOSTICS OF ELECTRONIC CIRCUITS AND SYSTEMS, 2010, : 54 - 59
  • [33] Exploring Different Automata Representations for Efficient Regular Expression Matching on GPUs
    Yu, Xiaodong
    Becchi, Michela
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 287 - 288
  • [34] Resource-Efficient Regular Expression Matching Architecture for Text Analytics
    Atasu, Kubilay
    PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014), 2014, : 1 - 8
  • [35] Benchmarking Regular Expression Matching
    Roodt, Alexander
    Watling, Brendan Keith Mark
    Bester, Willem
    van der Merwe, Brink
    Sung, Sicheol
    Han, Yo-Sub
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, CIAA 2024, 2024, 15015 : 316 - 331
  • [36] Greedy regular expression matching
    Frisch, A
    Cardelli, L
    AUTOMATA , LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2004, 3142 : 618 - 629
  • [37] Regular Expression Search on Compressed Text
    Ganty, Pierre
    Valero, Pedro
    2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 528 - 537
  • [38] Sparse Regular Expression Matching
    Bille, Philip
    Gortz, Inge Li
    PROCEEDINGS OF THE 2024 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2024, : 3354 - 3375
  • [39] Selective Regular Expression Matching
    Stakhanova, Natalia
    Ren, Hanli
    Ghorbani, Ali A.
    INFORMATION SECURITY, 2011, 6531 : 226 - +
  • [40] Faster Regular Expression Matching
    Bille, Philip
    Thorup, Mikkel
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT I, 2009, 5555 : 171 - +