Pattern Matching in YARA: Improved Aho-Corasick Algorithm

被引:4
|
作者
Regeciova, Dominika [1 ]
Kolar, Dusan [1 ]
Milkovic, Marek [2 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, Brno 61266, Czech Republic
[2] Avast Software Sro, Brno 60200, Czech Republic
来源
IEEE ACCESS | 2021年 / 9卷 / 09期
关键词
Pattern matching; Tools; Malware; Cats; Licenses; Companies; Syntactics; Aho-Corasick algorithm; pattern matching; regular expressions; YARA;
D O I
10.1109/ACCESS.2021.3074801
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and their implementation. We have, in fact, discovered several reasons why regular expressions can slow down scanning based on the nature of the used algorithm, Aho-Corasick. We have proposed a new version of this algorithm and have implemented it in the original version of this tool. The experiments are presented, proving that the speed of pattern matching with regular expressions can indeed be improved. In selected cases, the proposed version was about 27% faster than the original version. And in instances where strings were optimized for the original version, their speed was found to be comparable.
引用
收藏
页码:62857 / 62866
页数:10
相关论文
共 50 条
  • [1] AUGMENTING THE AHO-CORASICK PATTERN-MATCHING MACHINE
    SRIDHAR, MA
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1990, 32 (3-4) : 149 - 153
  • [2] Optimized Aho-Corasick String Matching Algorithm for Smart Phones
    Lu, Rui
    Pao, Derek
    2016 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2016, : 342 - 343
  • [3] Multiple-pattern matching in LZW compressed files using Aho-Corasick algorithm
    Tao, T
    Mukherjee, A
    DCC 2005: Data Compression Conference, Proceedings, 2005, : 482 - 482
  • [4] Efficient implementation of Aho-Corasick pattern matching automata using Unicode
    Nieminen, Janne
    Kilpelainen, Pekka
    SOFTWARE-PRACTICE & EXPERIENCE, 2007, 37 (06): : 669 - 690
  • [5] Heterogeneous Parallelization of Aho-Corasick Algorithm
    Soroushnia, Shima
    Daneshtalab, Masoud
    Plosila, Juha
    Liljeberg, Pasi
    8TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS (PACBB 2014), 2014, 294 : 153 - 160
  • [6] Space-Time Tradeoff in the Aho-Corasick String Matching Algorithm
    Xu, Yisi
    Pao, Derek
    2015 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2015, : 713 - 714
  • [7] Dictionary Matching: Review of the Aho-Corasick Algorithm and Vision for Large Dictionaries
    Qiao ZhanPeng
    Goto, Kento
    Ohshima, Takuya
    Tajima, Masahiro
    Motomichi, Toyama
    ICIST '18: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES, 2018,
  • [8] String Matching with Multicore CPUs: Performing Better with the Aho-Corasick Algorithm
    Arudchutha, S.
    Nishanthy, T.
    Ragel, R. G.
    2013 8TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2013, : 231 - 236
  • [9] Performance Optimization of Aho-Corasick Algorithm on a GPU
    Nhat-Phuong Tran
    Lee, Myungho
    Hong, Sugwon
    Bae, Jongwoo
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1143 - 1152
  • [10] Speed-up of Aho-Corasick pattern matching machines by rearranging states
    Nishimura, T
    Fukamachi, S
    Shinohara, T
    EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 175 - 185