What Can N-Grams Learn for Malware Detection?

被引:0
|
作者
Zak, Richard [1 ]
Raff, Edward [1 ]
Nicholas, Charles [2 ]
机构
[1] Booz Allen Hamilton, Lab Phys Sci, Mclean, VA 22102 USA
[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
关键词
SELECTION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recent work has shown that byte n-grams learn mostly low entropy features, such as function imports and strings, which has brought into question whether byte n-grams can learn information corresponding to higher entropy levels, such as binary code. We investigate that hypothesis in this work by performing byte n-gram analysis on only specific sub-sections of the binary file, and compare to results obtained by n-gram analysis on assembly code generated from disassembled binaries. We do this by leveraging the change in model performance and ensembles to glean insights about the data. In doing so we discover that byte n-grams can learn from the code regions, but do not necessarily learn any new information. We also discover that assembly n-grams may not be as effective as previously thought and that disambiguating instructions by their binary opcode, an approach not previously used for malware detection, is critical for model generalization.
引用
收藏
页码:109 / 118
页数:10
相关论文
共 50 条
  • [1] Embedded malware detection using Markov n-grams
    Shafiq, M. Zubair
    Khayam, Syed Ali
    Farooq, Muddassar
    [J]. DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, 2008, 5137 : 88 - +
  • [2] Malware Detection and Classification Based on n-grams Attribute Similarity
    Zhang Fuyong
    Zhao Tiezhou
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 793 - 796
  • [3] New malware detection framework based on N-grams and SVDD with SMO
    El Boujnouni, Mohamed
    Jedra, Mohamed
    Zahid, Noureddine
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (04): : 223 - 232
  • [4] Classification of Malware Families Based on N-grams Sequential Pattern Features
    Liangboonprakong, Chatchai
    Sornil, Ohm
    [J]. PROCEEDINGS OF THE 2013 IEEE 8TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2013, : 777 - 782
  • [5] Detection of Opinion Spam with Character n-grams
    Hernandez Fusilier, Donato
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Guzman Cabrera, Rafael
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 285 - 294
  • [6] Plagiarism Detection Using Stopword n-grams
    Stamatatos, Efstathios
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (12): : 2512 - 2527
  • [7] Spam detection using character N-grams
    Kanaris, Ioannis
    Kanaris, Konstantinos
    Stamatatos, Efstathios
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 95 - 104
  • [8] The Distribution of N-Grams
    Leo Egghe
    [J]. Scientometrics, 2000, 47 : 237 - 252
  • [9] Collocations and N-grams
    FREEBURY-JONES, D. A. R. R. E. N.
    [J]. RENAISSANCE AND REFORMATION, 2021, 44 (04) : 210 - 216
  • [10] The distribution of N-grams
    Egghe, L
    [J]. SCIENTOMETRICS, 2000, 47 (02) : 237 - 252