Deobfuscation, unpacking, and decoding of obfuscated malicious Java']JavaScript for machine learning models detection performance improvement

被引：21

作者：

Ndichu, Samuel ^{[1
]}

Kim, Sangwook ^{[1
]}

Ozawa, Seiichi ^{[1
,2
]}

机构：

[1] Kobe Univ, Grad Sch Engn, Kobe, Hyogo, Japan

[2] Kobe Univ, Ctr Math & Data Sci, Kobe, Hyogo, Japan

来源：

CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY | 2020年 / 5卷 / 03期

关键词：

D O I：

10.1049/trit.2020.0026

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Obfuscation is rampant in both benign and malicious JavaScript (JS) codes. It generates an obscure and undetectable code that hinders comprehension and analysis. Therefore, accurate detection of JS codes that masquerade as innocuous scripts is vital. The existing deobfuscation methods assume that a specific tool can recover an original JS code entirely. For a multi-layer obfuscation, general tools realize a formatted JS code, but some sections remain encoded. For the detection of such codes, this study performs Deobfuscation, Unpacking, and Decoding (DUD-preprocessing) by function redefinition using a Virtual Machine (VM), a JS code editor, and a python int_to_str() function to facilitate feature learning by the FastText model. The learned feature vectors are passed to a classifier model that judges the maliciousness of a JS code. In performance evaluation, the authors use the Hynek Petrak's dataset for obfuscated malicious JS codes and the SRILAB dataset and the Majestic Million service top 10,000 websites for obfuscated benign JS codes. They then compare the performance to other models on the detection of DUD-preprocessed obfuscated malicious JS codes. Their experimental results show that the proposed approach enhances feature learning and provides improved accuracy in the detection of obfuscated malicious JS codes.

引用

页码：184 / 192

页数：9

共 40 条

[21] Explaining poor performance of text-based machine learning models for vulnerability detection
Napier, Kollin
Bhowmik, Tanmay
Chen, Zhiqian
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (05)
[22] Performance Anomaly Detection Models of Virtual Machines for Network Function Virtualization Infrastructure with Machine Learning
Qiu, Juan
Du, Qingfeng
He, Yu
Lin, Yiqun
Zhu, Jiaye
Yin, Kanglin
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT II, 2018, 11140 : 479 - 488
[23] Performance Evaluation of Machine Learning Models for Cyber Threat Detection and Prevention in Mobile Money Services
Akinyemi, Bodunde Odunola
Olalere, Dauda Akinwuyi
Sanni, Mistura Laide
Olajubu, Emmanuel Ajayi
Aderounmu, Ganiyu Adesola
Ibrahim, Isa Ali
[J]. Informatica (Slovenia), 2023, 47 (06): : 173 - 190
[24] Diagnostic performance of machine learning models using cell population data for the detection of sepsis: a comparative study
Aguirre, Urko
Urrechaga, Eloisa
[J]. CLINICAL CHEMISTRY AND LABORATORY MEDICINE, 2023, 61 (02) : 356 - 365
[25] Studying Drowsiness Detection Performance While Driving Through Scalable Machine Learning Models Using Electroencephalography
Rogel, Jose Manuel Hidalgo
Beltran, Enrique Tomas Martinez
Perez, Mario Quiles
Bernal, Sergio Lopez
Perez, Gregorio Martinez
Celdran, Alberto Huertas
[J]. COGNITIVE COMPUTATION, 2024, 16 (03) : 1253 - 1267
[26] Increasing the performance of intrusion detection models developed using machine learning method with preprocessing applied to the dataset
Ilgun, Esen Gul
Samet, Refik
[J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (02): : 679 - 692
[27] Impact of Feature Selection Techniques on the Performance of Machine Learning Models for Depression Detection Using EEG Data
Hassan, Marwa
Kaabouch, Naima
[J]. Applied Sciences (Switzerland), 2024, 14 (22):
[28] On the Performance of Machine Learning Models for Anomaly-Based Intelligent Intrusion Detection Systems for the Internet of Things
Abdelmoumin, Ghada
Rawat, Danda B.
Rahman, Abdul
[J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (06): : 4280 - 4290
[29] Improvement of Underground Cavity and Structure Detection Performance Through Machine Learning-based Diffraction Separation of GPR Data
Kim, Sooyoon
Byun, Joongmoo
[J]. GEOPHYSICS AND GEOPHYSICAL EXPLORATION, 2023, 26 (04): : 171 - 184
[30] Performance variability of radiomics machine learning models for the detection of clinically significant prostate cancer in heterogeneous MRI datasets
Gresser, Eva
Schachtner, Balthasar
Stueber, Anna Theresa
Solyanik, Olga
Schreier, Andrea
Huber, Thomas
Froelich, Matthias Frank
Magistro, Giuseppe
Kretschmer, Alexander
Stief, Christian
Ricke, Jens
Ingrisch, Michael
Noerenberg, Dominik
[J]. QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2022, 12 (11) : 4990 - +

← 1 2 3 4 →