Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo

被引:0
|
作者
Aparna Sunil Kale
Vinay Pandya
Fabio Di Troia
Mark Stamp
机构
[1] San Jose State University,Department of Computer Science
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features and we implement machine learning techniques, where we apply word embedding techniques—specifically, Word2Vec, HMM2Vec, BERT, and ELMo—as a feature engineering step. The resulting embedding vectors are then used as features for classification algorithms. The classification algorithms that we employ are support vector machines (SVM), k-nearest neighbor (kNN), random forests (RF), and convolutional neural networks (CNN). We conduct substantial experiments involving seven malware families. Our experiments extend beyond previous related work in this field. We show that we can obtain slightly better performance than in comparable previous work, with significantly faster model training times.
引用
收藏
页码:1 / 16
页数:15
相关论文
共 50 条
  • [1] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Kale, Aparna Sunil
    Pandya, Vinay
    Di Troia, Fabio
    Stamp, Mark
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (01) : 1 - 16
  • [2] Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
    Qiao, Yanchen
    Zhang, Weizhe
    Du, Xiaojiang
    Guizani, Mohsen
    [J]. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2022, 22 (01)
  • [3] The Spectral Underpinning of word2vec
    Jaffe, Ariel
    Kluger, Yuval
    Lindenbaum, Ofir
    Patsenker, Jonathan
    Peterfreund, Erez
    Steinerberger, Stefan
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 6
  • [4] Emerging Trends Word2Vec
    Church, Kenneth Ward
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) : 155 - 162
  • [5] Research on Chinese Text Classification Based on Word2vec
    Yang, Zhi-Tong
    Zheng, Jun
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1166 - 1170
  • [6] Chinese Sentiment Classification Using Extended Word2Vec
    张胜
    张鑫
    程佳军
    王晖
    [J]. Journal of Donghua University(English Edition), 2016, 33 (05) : 823 - 826
  • [7] A Study of Chinese Document Representation and Classification with Word2vec
    Zhu, Lei
    Wang, Guijun
    Zou, Xiancun
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2016, : 298 - 302
  • [8] Short Text Classification Based on Wikipedia and Word2vec
    Liu Wensen
    Cao Zewen
    Wang Jun
    Wang Xiaoyi
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1195 - 1200
  • [9] Microblogging Short Text Classification based on Word2Vec
    Zhang, Yonghui
    Liu, Jingang
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 395 - 401
  • [10] Snomed2Vec: representation of SNOMED CT terms with Word2Vec
    Martinez Soriano, Ignacio
    Castro Pena, Juan Luis
    Fernandez Breis, Jesualdo T.
    San Roman, Ignacio
    Alonso Barriuso, Adrian
    Guevara Baraza, David
    [J]. 2019 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2019, : 678 - 683