A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning

被引：0

作者：

Yang W. ^{[1
,2
,3
]}

Gao M. ^{[1
,2
,3
]}

Jiang T. ^{[1
,2
,3
]}

机构：

[1] School of Cyber Science and Engineering, Southeast University, Nanjing

[2] Key Laboratory of Computer Network and Information Integration(Southeast University), Ministry of Education, Nanjing

[3] Jiangsu Provincial Key Laboratory of Computer Network Technology, Southeast University, Nanjing

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2021年 / 58卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Ensemble learning; Malicious code; Multiple features; Policy voting; Static detection;

D O I：

10.7544/issn1000-1239.2021.20200912

中图分类号：

学科分类号：

摘要：

With the popularity of the Internet and the rapid development of 5G communication technology, the threats to cyberspace are increasing, especially the exponential increase in the number of malware and the explosive increase in the number of variants of their families. The traditional signature-based malware detection is too slow to handle the millions of new malwares emerged every day, while the false positive and false negative rates of general machine learning classifiers are significantly too high. At the same time malware packing, obfuscation and other adversarial techniques have caused more trouble to the situation. Based on this, we propose a static malware detection framework based on multi-feature ensemble learning. By extracting the non-PE (Portable Executable) structure feature, visible string feature, sink assembly code sequences feature, PE structure feature and function call relationship feature from the malware, we construct models matching each feature, and use Bagging and Stacking ensemble algorithms to reduce the risk of overfitting. Finally we adopt the weighted voting algorithm to further aggregate the output results of the ensemble model. The experimental results show the detection accuracy of multi-feature multi-model aggregation algorithm can reach 96.99%, which prove the method has better malware identification ability than other static detection methods, and higher recognition rate for malwares using packing or obfuscation techniques. © 2021, Science Press. All right reserved.

引用

页码：1021 / 1034

页数：13

共 33 条

[1] Analysisreport of China's internet network security monitoring data in the first half of 2020
[2] AV-Test IT Security Institute Website
[3] Ye Yanfang, Li Tao, Adjeroh D, Et al., A survey on malware detection using data mining techniques, ACM Computing Surveys, 50, 3, pp. 341-380, (2017)
[4] Mohanta A, Saldanha A., Malwarepackers, Malware Analysis and Detection Engineering, pp. 189-211, (2020)
[5] Gaudesi M, Marcelli A, Sanchez E, Et al., Challenging anti-virus through evolutionary malware obfuscation, Proc of European Conf on the Applications of Evolutionary Computation, pp. 149-162, (2016)
[6] IDA Pro Website
[7] Yuschuk Oleh, OllyDbg Website
[8] Nataraj L, Karthikeyan S, Jacob G, Et al., Malware images: Visualization and automatic classification, Proc of the 8th Int Symp on Visualization for Cyber Security, (2011)
[9] Saxe J, Berlin K., Deep neural network based malware detection using two dimensional binary program features, Proc of the 10th Int Conf on Malicious and Unwanted Software (MALWARE), pp. 11-20, (2015)
[10] Anderson H S, Roth P., Ember: An open dataset for training static pe malware machine learning models, (2018)

← 1 2 3 4 →