A Novel Feature Encoding Scheme for Machine Learning Based Malware Detection Systems

被引：0

作者：

Das, Vipin ^{[1
]}

Nair, Binoy B. ^{[2
]}

Thiruvengadathan, Rajagopalan ^{[3
]}

机构：

[1] Amrita Vishwa Vidyapeetham, Amrita Sch Artificial Intelligence, Coimbatore 641112, India

[2] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Amrita Sch Engn, Coimbatore 641112, India

[3] Southern Utah Univ, Dept Engn & Technol, Cedar City, UT 84720 USA

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Malware; Codes; Encoding; Machine learning; Feature extraction; Grippers; Static analysis; Computer security; Intrusion detection; Classification algorithms; Detection algorithms; Cybersecurity; categorical encoding; intrusion detection; machine learning; malware classification; malware detection; NETWORK INTRUSION DETECTION; UNSW-NB15 DATA SET; IOT; ALGORITHM; MECHANISM; INTERNET; TAXONOMY; THINGS; SELECTION; DEFENSE;

D O I：

10.1109/ACCESS.2024.3420080

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Malware detection is an ever-evolving area given that the strides in the detection capabilities being matched by radical attempts to bypass the detection. As the sophistication of malware continues to increase, the demand for innovative approaches to improve detection capabilities become paramount. Machine learning/Deep learning models are being increasingly used for Malware Detection, however one of the most important and frequently overlooked aspects of building such models is feature encoding. This research paper explores the importance of feature encoding to improve the efficiency of threat detection and proposes a novel entropy-based encoding scheme for the categorical features present in the data extracted from malicious inputs. The KDDCUP99, UNSW-NB15 and CIC-Evasive-PDFMal2022 datasets have been used to evaluate the effectiveness of the proposed encoding scheme. The results of the proposed encoding scheme are validated against seven other encoding schemes to ascertain the credibility and usability of the proposed scheme. The efficiency of the proposed system evaluated by applying different encoded versions of the datasets to train various machine learning models and determining the classification performance of the models on each dataset. The machine learning models trained with the proposed encoding scheme produced stable classification results and outperformed other encoding schemes when dimensionality reduction was applied on the data. The ensemble classifier trained using the proposed scheme was able to classify the data with an F1 score of 99.99% when the dimension-reduced entropy-encoded KDD Cup99 dataset was used to build the model. On the CIC-Evasive-PDFMal2022 dataset, the entropy encoding has exhibited a slightly improved classification parameters with the ensemble methods yielding a peak F1 score of 99.27%. We have also determined the feature importance values of the features present in the datasets to study the change in the contribution levels of the features when multiple categorical encoding schemes are applied upon the data.

引用

页码：91187 / 91216

页数：30

共 50 条

[41] Study on Android Hybrid Malware Detection Based on Machine Learning
Kuo, Wen-Chung
Liu, Tsung-Ping
Wang, Chun-Cheng
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 31 - 35
[42] A Review of Android Malware Detection Approaches Based on Machine Learning
Liu, Kaijun
Xu, Shengwei
Xu, Guoai
Zhang, Miao
Sun, Dawei
Liu, Haifeng
IEEE ACCESS, 2020, 8 (08): : 124579 - 124607
[43] Malware Detection based on HTTPS Characteristic via Machine Learning
Calderon, Paul
Hasegawa, Hirokazu
Yamaguchi, Yukiko
Shimada, Hajime
ICISSP: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2018, : 410 - 417
[44] Support Vector Machine Based on Incremental Learning for Malware Detection
Zhuang Weiwei
Xiao Lei
Cui JianFeng
Zhuang WeiChuan
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENT COMMUNICATION, 2015, 16 : 204 - 207
[45] Malware Detection in Android Systems with Traditional Machine Learning Models: A Survey
Bayazit, Esra Calik
Sahingoz, Ozgur Koray
Dogan, Buket
2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, : 374 - 381
[46] A Novel Machine Learning Approach for Android Malware Detection Based on the Co-Existence of Features
Odat, Esraa
Yaseen, Qussai M.
IEEE ACCESS, 2023, 11 : 15471 - 15484
[47] MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System
Ali, Muhammad
Shiaeles, Stavros
Bendiab, Gueltoum
Ghita, Bogdan
ELECTRONICS, 2020, 9 (11) : 1 - 20
[48] Malware classification based on double byte feature encoding
Li, Lin
Ding, Ying
Li, Bo
Qiao, Mengqing
Ye, Biao
ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (01) : 91 - 99
[49] An Effective Malware Detection Method Using Hybrid Feature Selection and Machine Learning Algorithms
Namita Dabas
Prachi Ahlawat
Prabha Sharma
Arabian Journal for Science and Engineering, 2023, 48 : 9749 - 9767
[50] Detection of Exceptional Malware Variants Using Deep Boosted Feature Spaces and Machine Learning
Asam, Muhammad
Hussain, Shaik Javeed
Mohatram, Mohammed
Khan, Saddam Hussain
Jamal, Tauseef
Zafar, Amad
Khan, Asifullah
Ali, Muhammad Umair
Zahoora, Umme
APPLIED SCIENCES-BASEL, 2021, 11 (21):

← 1 2 3 4 5 →