Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo

被引:0
|
作者
Aparna Sunil Kale
Vinay Pandya
Fabio Di Troia
Mark Stamp
机构
[1] San Jose State University,Department of Computer Science
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features and we implement machine learning techniques, where we apply word embedding techniques—specifically, Word2Vec, HMM2Vec, BERT, and ELMo—as a feature engineering step. The resulting embedding vectors are then used as features for classification algorithms. The classification algorithms that we employ are support vector machines (SVM), k-nearest neighbor (kNN), random forests (RF), and convolutional neural networks (CNN). We conduct substantial experiments involving seven malware families. Our experiments extend beyond previous related work in this field. We show that we can obtain slightly better performance than in comparable previous work, with significantly faster model training times.
引用
下载
收藏
页码:1 / 16
页数:15
相关论文
共 50 条
  • [31] ExMrec2vec: Explainable Movie Recommender System based on Word2vec
    Samih, Amina
    Ghadi, Abderrahim
    Fennan, Abdelhadi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 653 - 660
  • [32] Key word extraction for short text via word2vec, doc2vec, and textrank
    Li, Jun
    Huang, Guimin
    Fan, Chunli
    Sun, Zhenglin
    Zhu, Hongtao
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (03) : 1794 - 1805
  • [33] word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data
    Grohe, Martin
    PODS'20: PROCEEDINGS OF THE 39TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2020, : 1 - 16
  • [34] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [35] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [36] Study on Tibetan Word Vector based on Word2vec
    Yang, Ning
    Li, Guanyu
    Ding, Hailan
    Gong, Chunwei
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [37] An Word2vec based on Chinese Medical Knowledge
    Zhu, Jiayi
    Ni, Pin
    Li, Yuming
    Peng, Junkun
    Dai, Zhenjin
    Le, Gangmin
    Bai, Xuming
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6263 - 6265
  • [38] Optimizing Word2Vec Performance on Multicore Systems
    Rengasamy, Vasudevan
    Fu, Tao-Yang
    Lee, Wang-Chien
    Madduri, Kamesh
    PROCEEDINGS OF IA3 2017: SEVENTH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2017,
  • [39] A Word2vec Model for Sentiment Analysis of Weibo
    Shi, Bowen
    Zhao, Jichang
    Xu, Ke
    2019 16TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM2019), 2019,
  • [40] Text classification model based on Word2vec and SF-HAN
    Li, Zhien
    Rao, Zhuyi
    PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1385 - 1390