BHMDC: A byte and hex n-gram based malware detection and classification method

被引:4
|
作者
Tang, Yonghe [1 ]
Qi, Xuyan [1 ]
Jing, Jing [1 ]
Liu, Chunling [1 ]
Dong, Weiyu [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Zhengzhou 450000, Peoples R China
关键词
Malware detection; Malware classification; Byte n-gram; Hex n-gram; Random forest; Light gradient boosting machine; MODEL;
D O I
10.1016/j.cose.2023.103118
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, malware and their variants have proliferated, which poses a grave threat to the systems and networks' security, so it is urgent to detect and classify malware in time to prevent the spread of malicious activities. However, the existing malware detection and classification methods can't meet the requirement of the application perfectly. Among them, machine learning-based approaches generally face the dilemma of balancing efficiency and accuracy due to imperfect feature representation, while deep learning-based methods are usually computationally intense to train and deploy. In order to solve the problem, we focus on improving the feature extraction and classification model, and propose a Byte and Hex n-gram based Malware Detection and Classification method called BHMDC in this paper. For mal-ware detection, LightGBM is used to detect malware with just 256-dimensional byte unigram features, which achieves an accuracy of more than 99.70% on two built datasets with less time consumption. For malware classification, block byte unigram and hex n-gram are proposed and combined together as the feature, which can preserve more properties and profile executable files in a multi-granular way, then random forest is used to optimize the feature by removing redundant information and reducing the di-mensionality, and LightGBM is finally utilized to identify malware families. The performance of the pro-posed approach is evaluated through experiments, and it is compared with state-of-the-art methods. The proposed approach produces 99.264% accuracy on Microsoft malware classification challenge dataset and 99.775% accuracy on Malimg dataset respectively, which substantially outperforms the other approaches. Promising experimental results reveal that BHMDC can be used in antivirus software to detect malware variants and help security analysts to identify malware families.(c) 2023 Published by Elsevier Ltd.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Malware classification based on double byte feature encoding
    Li, Lin
    Ding, Ying
    Li, Bo
    Qiao, Mengqing
    Ye, Biao
    ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (01) : 91 - 99
  • [42] Malicious Domain Names Detection Algorithm Based on N-Gram
    Zhao, Hong
    Chang, Zhaobin
    Bao, Guangbin
    Zeng, Xiangyan
    JOURNAL OF COMPUTER NETWORKS AND COMMUNICATIONS, 2019, 2019
  • [43] Speech Corpus Generation Based on N-gram Confidence Measure Classification
    Koctur, Tomas
    Ondas, Stanislav
    Juhar, Jozef
    PROCEEDINGS OF 2017 INTERNATIONAL SYMPOSIUM ELMAR, 2017, : 149 - 152
  • [44] Classification of ransomware families with machine learning based on N-gram of opcodes
    Zhang, Hanqi
    Xiao, Xi
    Mercaldo, Francesco
    Ni, Shiguang
    Martinelli, Fabio
    Sangaiah, Arun Kumar
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 211 - 221
  • [45] An n-gram based approach to the automatic classification of schoolchildren's writing
    Cicres, Jordi
    Queralt, Sheila
    VIAL-VIGO INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2019, 16 : 53 - 80
  • [46] Web Page Classification using n-gram based URL Features
    Rajalakshmi, R.
    Aravindan, Chandrabose
    2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
  • [47] Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection
    Kuta, Marcin
    Kitowski, Jacek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2014, PT II, 2014, 8468 : 500 - 511
  • [48] N-gram analysis for computer virus detection
    Reddy, D. Krishna Sandeep
    Pujari, Arun K.
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2006, 2 (03): : 231 - 239
  • [49] Exploiting n-gram location for intrusion detection
    Angiulli, Fabrizio
    Argento, Luciano
    Furfaro, Angelo
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 1093 - 1098
  • [50] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +