PDF Malware Detection Using Visualization and Machine Learning

被引:2
|
作者
Liu, Ching-Yuan [1 ]
Chiu, Min-Yi [2 ]
Huang, Qi-Xian [2 ]
Sun, Hung-Min [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan
关键词
Malware detection; PDF malware; Malware visualization; Machine learning;
D O I
10.1007/978-3-030-81242-3_12
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, as more and more disasters caused by malware have been reported worldwide, people started to pay more attention to malware detection to prevent malicious attacks in advance. According to the diversity of the software platforms that people use, the malware also varies pretty much, for example: Xcode Ghost on iOS apps, FakePlayer on Android apps, and WannaCrypt on PC. Moreover, most of the time people ignore the potential security threats around us while surfing the internet, processing files or even reading email. The Portable Document Format (PDF) file, one of the most commonly used file types in the world, can be used to store texts, images, multimedia contents, and even scripts. However, with the increasing popularity and demands of PDF files, only a small fraction of people know how easy it could be to conceal malware in normal PDF files. In this paper, we propose a novel technique combining Malware Visualization and Image Classification to detect PDF files and identify which ones might be malicious. By extracting data from PDF files and traversing each object within, we can obtain the holistic treelike structure of PDF files. Furthermore, according to the signature of the objects in the files, we assign different colors obtained from SimHash to generate RGB images. Lastly, our proposed model trained by the VGG19 with CNN architecture achieved up to 0.973 accuracy and 0.975 F1-score to distinguish malicious PDF files, which is viable for personal, or enterprise-wide use and easy to implement.
引用
收藏
页码:209 / 220
页数:12
相关论文
共 50 条
  • [1] PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis
    Hossain, G. M. Sakhawat
    Deb, Kaushik
    Janicke, Helge
    Sarker, Iqbal H.
    [J]. IEEE ACCESS, 2024, 12 : 13833 - 13859
  • [2] Malware Detection Using Machine Learning
    Kumar, Ajay
    Abhishek, Kumar
    Shah, Kunjal
    Patel, Divy
    Jain, Yash
    Chheda, Harsh
    Nerurka, Pranav
    [J]. KNOWLEDGE GRAPHS AND SEMANTIC WEB, KGSWC 2020, 2020, 1232 : 61 - 71
  • [3] PDF Malware Detection based on Stacking Learning
    Issakhani, Maryam
    Victor, Princy
    Tekeoglu, Ali
    Lashkari, Arash Habibi
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2021, : 562 - 570
  • [4] Robust PDF Malware Detection with Image Visualization and Processing Techniques
    Corum, Andrew
    Jenkins, Donovan
    Zheng, Jun
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, : 108 - 114
  • [5] Android Malware Detection Using Machine Learning
    Droos, Ayat
    Al-Mahadeen, Awss
    Al-Harasis, Tasnim
    Al-Attar, Rama
    Ababneh, Mohammad
    [J]. 2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 36 - 41
  • [6] A Novel Malware Detection System Based On Machine Learning and Binary Visualization
    Baptista, Irina
    Shiaeles, Stavros
    Kolokotronis, Nicholas
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2019,
  • [7] Automatic malware classification and new malware detection using machine learning
    Liu, Liu
    Wang, Bao-sheng
    Yu, Bo
    Zhong, Qiu-xi
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (09) : 1336 - 1347
  • [8] Automatic malware classification and new malware detection using machine learning
    Liu Liu
    Bao-sheng Wang
    Bo Yu
    Qiu-xi Zhong
    [J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18 : 1336 - 1347
  • [9] Malware Analysis and Detection Using Machine Learning Algorithms
    Akhtar, Muhammad Shoaib
    Feng, Tao
    [J]. SYMMETRY-BASEL, 2022, 14 (11):
  • [10] Android Malware Detection Using Machine Learning: A Review
    Chowdhury, Naseef-Ur-Rahman
    Haque, Ahshanul
    Soliman, Hamdy
    Hossen, Mohammad Sahinur
    Fatima, Tanjim
    Ahmed, Imtiaz
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 3, INTELLISYS 2023, 2024, 824 : 507 - 522