Pattern Matching in YARA: Improved Aho-Corasick Algorithm

被引:4
|
作者
Regeciova, Dominika [1 ]
Kolar, Dusan [1 ]
Milkovic, Marek [2 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, Brno 61266, Czech Republic
[2] Avast Software Sro, Brno 60200, Czech Republic
来源
IEEE ACCESS | 2021年 / 9卷 / 09期
关键词
Pattern matching; Tools; Malware; Cats; Licenses; Companies; Syntactics; Aho-Corasick algorithm; pattern matching; regular expressions; YARA;
D O I
10.1109/ACCESS.2021.3074801
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and their implementation. We have, in fact, discovered several reasons why regular expressions can slow down scanning based on the nature of the used algorithm, Aho-Corasick. We have proposed a new version of this algorithm and have implemented it in the original version of this tool. The experiments are presented, proving that the speed of pattern matching with regular expressions can indeed be improved. In selected cases, the proposed version was about 27% faster than the original version. And in instances where strings were optimized for the original version, their speed was found to be comparable.
引用
收藏
页码:62857 / 62866
页数:10
相关论文
共 50 条
  • [21] A File Undelete with Aho-Corasick Algorithm In File Recovery
    Sitompul, Opim Salim
    Handoko, Andrew
    Rahmat, Romi Fadillah
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 427 - 431
  • [22] A Memory-Efficient Pipelined Implementation of the Aho-Corasick String-Matching Algorithm
    Pao, Derek
    Lin, Wei
    Liu, Bin
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2010, 7 (02)
  • [23] An Efficient Multicharacter Transition String-Matching Engine Based on the Aho-Corasick Algorithm
    Chen, Chien-Chi
    Wang, Sheng-De
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)
  • [24] A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM
    Chen, Chien-Chi
    Wang, Sheng-De
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (12): : 8367 - 8386
  • [25] Auto implementation of parallel hardware architecture for Aho-Corasick algorithm
    Najam-ul-Islam, M.
    Zahra, Fatima Tu
    Jafri, Atif Raza
    Shah, Roman
    ul Hassan, Masood
    Rashid, Muhammad
    DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2022, 26 (01) : 29 - 53
  • [26] Node Configuration for the Aho-Corasick Algorithm in Intrusion Detection Systems
    Lacroix, Alexsandre B.
    Langlois, J. M. Pierre
    Boyer, Francois-Raymond
    Gosselin, Antoine
    Bois, Guy
    PROCEEDINGS OF THE 2016 SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'16), 2016, : 121 - 122
  • [27] A Memory Accessing Method for the Parallel Aho-Corasick Algorithm on GPU
    Yoon, JinMyung
    Choi, Kang-Il
    Kim, HyunJin
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SECURITY (ICISS), 2014, : 116 - 118
  • [28] A failureless pipelined Aho-Corasick algorithm for FPGA-based parallel string matching engine
    Kim, HyunJin
    Lecture Notes in Electrical Engineering, 2015, 339 : 157 - 164
  • [29] Auto implementation of parallel hardware architecture for Aho-Corasick algorithm
    M. Najam-ul-Islam
    Fatima Tu Zahra
    Atif Raza Jafri
    Roman Shah
    Masood ul Hassan
    Muhammad Rashid
    Design Automation for Embedded Systems, 2022, 26 : 29 - 53
  • [30] Memory-Based Architecture for Multicharacter Aho-Corasick String Matching
    Wang, Xing
    Pao, Derek
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (01) : 143 - 154