Efficient Regular Expression Matching on Compressed Strings

被引:0
|
作者
Han, Yutong [1 ]
Wang, Bin [1 ]
Yang, Xiaochun [1 ]
Zhu, Huaijie [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
关键词
Regular expression; LZ77; String matching; Self-index; SEARCH;
D O I
10.1007/978-3-319-55699-4_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 50 条
  • [1] Efficient regular expression matching on LZ77 compressed strings using negative factors
    Yutong Han
    Bin Wang
    Xiaochun Yang
    Tao Qiu
    Huaijie Zhu
    World Wide Web, 2019, 22 : 2519 - 2543
  • [2] Efficient regular expression matching on LZ77 compressed strings using negative factors
    Han, Yutong
    Wang, Bin
    Yang, Xiaochun
    Qiu, Tao
    Zhu, Huaijie
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (06): : 2519 - 2543
  • [3] Efficient regular expression matching over compressed traffic
    Sun, Xiuwen
    Li, Hao
    Zhao, Dan
    Lu, Xingxing
    Peng, Zheng
    Hu, Chengchen
    COMPUTER NETWORKS, 2020, 168 (168)
  • [4] Approximate regular expression matching with multi-strings
    Belazzougui, Djamal
    Raffinot, Mathieu
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 18 : 14 - 21
  • [5] Approximate Regular Expression Matching with Multi-strings
    Belazzougui, Djamal
    Raffinot, Mathieu
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 55 - 66
  • [6] Regular Expression Matching with Multi-Strings and Intervals
    Bille, Philip
    Thorup, Mikkel
    PROCEEDINGS OF THE TWENTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2010, 135 : 1297 - 1308
  • [7] Efficient regular expression matching over hybrid dictionary-based compressed data
    Sun, Xiuwen
    Mo, Da
    Wu, Di
    Ye, Chunhui
    Yu, Qingying
    Cui, Jie
    Zhong, Hong
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2023, 215
  • [8] Accelerating Regular Expression Matching Over Compressed HTTP
    Becchi, Michela
    Bremler-Barr, Anat
    Hay, David
    Kochba, Omer
    Koral, Yaron
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,
  • [9] Negative Factor: Improving Regular-Expression Matching in Strings
    Yang, Xiaochun
    Qiu, Tao
    Wang, Bin
    Zheng, Baihua
    Wang, Yaoshu
    Li, Chen
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2015, 40 (04):
  • [10] DFA-Based Regular Expression Matching on Compressed Traffic
    Sun, Yan
    Kim, Min Sik
    2011 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2011,