What Can N-Grams Learn for Malware Detection?

被引：0

作者：

Zak, Richard ^{[1
]}

Raff, Edward ^{[1
]}

Nicholas, Charles ^{[2
]}

机构：

[1] Booz Allen Hamilton, Lab Phys Sci, Mclean, VA 22102 USA

[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA

来源：

PROCEEDINGS OF THE 2017 12TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE) | 2017年

关键词：

SELECTION;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Recent work has shown that byte n-grams learn mostly low entropy features, such as function imports and strings, which has brought into question whether byte n-grams can learn information corresponding to higher entropy levels, such as binary code. We investigate that hypothesis in this work by performing byte n-gram analysis on only specific sub-sections of the binary file, and compare to results obtained by n-gram analysis on assembly code generated from disassembled binaries. We do this by leveraging the change in model performance and ensembles to glean insights about the data. In doing so we discover that byte n-grams can learn from the code regions, but do not necessarily learn any new information. We also discover that assembly n-grams may not be as effective as previously thought and that disambiguating instructions by their binary opcode, an approach not previously used for malware detection, is critical for model generalization.

引用

页码：109 / 118

页数：10

共 50 条

[41] An effective combination of different order N-grams
Zhang, S
Dong, N
[J]. PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 251 - 256
[42] Protein classification using modified n-grams and skip-grams
Islam, S. M. Ashiqul
Heil, Benjamin J.
Kearney, Christopher Michel
Baker, Erich J.
[J]. BIOINFORMATICS, 2018, 34 (09) : 1481 - 1487
[43] Automatic statistical translation based on n-grams
Oliver, Antonio
Badia, Toni
Boleda, Gemma
Melero, Maite
[J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 77 - 84
[44] Reconstructing Textual Documents from n-grams
Galle, Matthias
Tealdi, Matias
[J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 329 - 338
[45] Detection and explanation of anomalous activities:: Representing activities as bags of event n-grams
Hamid, R
Johnson, A
Batta, S
Bobick, A
Isbell, C
Coleman, G
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 1031 - 1038
[46] Applications of N-grams in textual information systems
Robertson, AM
Willett, P
[J]. JOURNAL OF DOCUMENTATION, 1998, 54 (01) : 48 - 69
[47] Detection of algorithmically generated malicious domain names using masked N-grams
Selvi, Jose
Rodriguez, Ricardo J.
Soria-Olivas, Emilio
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 156 - 163
[48] Fake News detection using n-grams for PAN@CLEF competition
Damian, Sergio
Calvo, Hiram
Gelbukh, Alexander
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4633 - 4640
[49] Building Wikipedia N-grams with Apache Spark
Esmaeilzadeh, Armin
Cacho, Jorge Ramon Fonseca
Taghva, Kazem
Kambar, Mina Esmail Zadeh Nojoo
Hajiali, Mahdi
[J]. INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 672 - 684
[50] Interpolated N-Grams for Model Based Testing
Tonella, Paolo
Tiella, Roberto
Cu Duy Nguyen
[J]. 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 562 - 572

← 1 2 3 4 5 →