A Universal Malicious Documents Static Detection Framework Based on Feature Generalization

被引：10

作者：

Lu, Xiaofeng ^{[1
]}

Wang, Fei ^{[1
]}

Jiang, Cheng ^{[1
]}

Lio, Pietro ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China

[2] Univ Cambridge, Comp Lab, Cambridge CB3 0FD, England

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 24期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

malicious document detection; static detection; feature generalization; machine learning;

D O I：

10.3390/app112412134

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.

引用

页数：23

共 50 条

[1] UFADF: A Unified Feature Analysis and Detection Framework for Malicious Office Documents
Hu, Yang
Chen, Jia
Luo, Xin
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 881 - 888
[2] A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning
Yang W.
Gao M.
Jiang T.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 1021 - 1034
[3] Static detection of malicious JavaScript-bearing PDF documents
Laskov, Pavel
Šrndić, Nedim
ACM International Conference Proceeding Series, 2011, : 373 - 382
[4] A STATIC DETECTION MODEL OF MALICIOUS PDF DOCUMENTS BASED ON NAIVE BAYESIAN CLASSIFIER TECHNOLOGY
Cheng, Huang
Yong, Fang
Liang, Liu
Wang, Lu-Rong
2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 29 - 32
[5] The De-Obfuscation Method in the Static Detection of Malicious PDF Documents
Wang, Yuntao
Proceedings - 2021 7th Annual International Conference on Network and Information Systems for Computers, ICNISC 2021, 2021, : 44 - 47
[6] Static Detection of Malicious Java']JavaScript-Bearing PDF Documents
Laskov, Pavel
Srndic, Nedim
27TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2011), 2011, : 373 - 382
[7] Feature Selection Framework for Optimizing ML-based Malicious URL Detection
Shah, Sajjad H.
Garu, Amit
Nguyen, Duong N.
Borowczak, Mike
2024 CYBER AWARENESS AND RESEARCH SYMPOSIUM, CARS 2024, 2024,
[8] Feature representation and selection in malicious code detection methods based on static system calls
Ding Yuxin
Yuan Xuebing
Zhou Di
Dong Li
An Zhanchao
COMPUTERS & SECURITY, 2011, 30 (6-7) : 514 - 524
[9] An efficient malicious webpage static detection framework based on optimized Bayesian and hybrid machine learning
Yang, Fan
Zhu, Chaoqun
Xu, Heng
Qian, Yongfeng
Song, Jun
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
[10] Static detection of malicious PowerShell based on word embeddings
Mimura, Mamoru
Tajiri, Yui
INTERNET OF THINGS, 2021, 15

← 1 2 3 4 5 →