Efficient Regular Expression Matching on Compressed Strings

被引:0
|
作者
Han, Yutong [1 ]
Wang, Bin [1 ]
Yang, Xiaochun [1 ]
Zhu, Huaijie [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Liaoning, Peoples R China
关键词
Regular expression; LZ77; String matching; Self-index; SEARCH;
D O I
10.1007/978-3-319-55699-4_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.
引用
收藏
页码:219 / 234
页数:16
相关论文
共 50 条
  • [41] A Power-Efficient Approach to TCAM-based Regular Expression Matching
    Huang, Kun
    Chen, Xuelin
    2018 27TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2018,
  • [42] Fast and Memory-Efficient Regular Expression Matching Using Transition Sharing
    Zhang, Shuzhuang
    Luo, Hao
    Fang, Binxing
    Yun, Xiaochun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (10) : 1953 - 1960
  • [43] Fast, memory-efficient regular expression matching with NFA-OBDDs
    Yang, Liu
    Karim, Rezwana
    Ganapathy, Vinod
    Smith, Randy
    COMPUTER NETWORKS, 2011, 55 (15) : 3376 - 3393
  • [44] Offset-FA: Detach the Closures and Countings for Efficient Regular Expression Matching
    Xu, Chengcheng
    Su, Jinshu
    Chen, Shuhui
    Han, Biao
    2017 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CLOUD AND SERVICE COMPUTING (SC2 2017), 2017, : 263 - 266
  • [45] MEMORY-EFFICIENT REGULAR EXPRESSION MATCHING FOR CHINESE NETWORK CONTENT AUDIT
    Zhu, Zezhi
    Lin, Ping
    Chen, Luying
    Zhang, Kun
    2009 IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT, PROCEEDINGS, 2009, : 144 - 148
  • [46] RALE-DFA: An efficient DFA compression method for Regular Expression Matching
    Wang, Huanyun
    Yang, Xiaobo
    Zhang, Dafang
    Bi, Xia-An
    International Journal of Digital Content Technology and its Applications, 2012, 6 (15) : 398 - 408
  • [47] Time/space efficient compressed pattern matching
    Gasieniec, L
    Potapov, I
    FUNDAMENTA INFORMATICAE, 2003, 56 (1-2) : 137 - 154
  • [48] Efficient Approximate Substring Matching in Compressed String
    Han, Yutong
    Wang, Bin
    Yang, Xiaochun
    Web-Age Information Management, Pt II, 2016, 9659 : 184 - 197
  • [49] Efficient string matching in Huffman compressed texts
    Fredriksson, K
    Tarhio, J
    FUNDAMENTA INFORMATICAE, 2004, 63 (01) : 1 - 16
  • [50] Regular expression pattern matching for XML
    Hosoya, H
    Pierce, BC
    JOURNAL OF FUNCTIONAL PROGRAMMING, 2003, 13 : 961 - 1004