PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis

被引:0
|
作者
Hossain, G. M. Sakhawat [1 ,2 ]
Deb, Kaushik [1 ]
Janicke, Helge [3 ,4 ]
Sarker, Iqbal H. [3 ,4 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Comp Sci & Engn, Chattogram 4349, Bangladesh
[2] Rangamati Sci & Technol Univ, Dept Comp Sci & Engn, Chattogram 4500, Bangladesh
[3] Cyber Secur Cooperat Res Ctr, Joondalup, WA 6027, Australia
[4] Edith Cowan Univ, Secur Res Inst, Sch Sci, Perth, WA 6027, Australia
关键词
Cybersecurity; PDF malware; data analytics; machine learning; decision rule; explainable AI; human interpretation; FILES; DOCUMENTS;
D O I
10.1109/ACCESS.2024.3357620
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Portable Document Format (PDF) is one of the most widely used file types, thus fraudsters insert harmful code into victims' PDF documents to compromise their equipment. Conventional solutions and identification techniques are often insufficient and may only partially prevent PDF malware because of their versatile character and excessive dependence on a certain typical feature set. The primary goal of this work is to detect PDF malware efficiently in order to alleviate the current difficulties. To accomplish the goal, we first develop a comprehensive dataset of 15958 PDF samples taking into account the non-malevolent, malicious, and evasive behaviors of the PDF samples. Using three well-known PDF analysis tools (PDFiD, PDFINFO, and PDF-PARSER), we extract significant characteristics from the PDF samples of our newly created dataset. In addition, we generate a number of derivations of features that have been experimentally proven to be helpful in classifying PDF malware. We develop a method to build an efficient and explicable feature set through the proper empirical analysis of the extracted and derived features. We explore different baseline machine learning classifiers and demonstrate an accuracy improvement of approx. 2% for the Random Forest classifier utilizing the selected feature set. Furthermore, we demonstrate the model's explainability by creating a decision tree that generates rules for human interpretation. Eventually, we make a comparison with previous studies and point out some important findings.
引用
收藏
页码:13833 / 13859
页数:27
相关论文
共 50 条
  • [1] PDF Malware Detection Using Visualization and Machine Learning
    Liu, Ching-Yuan
    Chiu, Min-Yi
    Huang, Qi-Xian
    Sun, Hung-Min
    [J]. DATA AND APPLICATIONS SECURITY AND PRIVACY XXXV, 2021, 12840 : 209 - 220
  • [2] Toward Robust Classifiers for PDF Malware Detection
    Albahar, Marwan
    Thanoon, Mohammed
    Alzilai, Monaj
    Alrehily, Alaa
    Alfaar, Munirah
    Algamdi, Maimoona
    Alassaf, Norah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 2181 - 2202
  • [3] PDF Malware Detection based on Stacking Learning
    Issakhani, Maryam
    Victor, Princy
    Tekeoglu, Ali
    Lashkari, Arash Habibi
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2021, : 562 - 570
  • [4] Analysis of machine learning models for malware detection
    Rahul
    Kedia, Priyansh
    Sarangi, Subrat
    Monika
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2020, 23 (02): : 395 - 407
  • [5] ANALYSIS OF MACHINE LEARNING METHODS ON MALWARE DETECTION
    Aydogan, Emre
    Sen, Sevil
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 2066 - 2069
  • [6] Malware Analysis and Detection Using Machine Learning Algorithms
    Akhtar, Muhammad Shoaib
    Feng, Tao
    [J]. SYMMETRY-BASEL, 2022, 14 (11):
  • [7] A Novel Malware Analysis for Malware Detection and Classification using Machine Learning Algorithms
    Sethi, Kamalakanta
    Chaudhary, Shankar Kumar
    Tripathy, Bata Krishan
    Bera, Padmalochan
    [J]. SIN'17: PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SECURITY OF INFORMATION AND NETWORKS, 2017, : 107 - 113
  • [8] Application of Machine Learning in Malware Detection
    Van Quynh, Trinh
    Hien, Vu Thanh
    Nguyen, Vu Thanh
    Bao, Huynh Quoc
    [J]. FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 362 - 374
  • [9] IoT Malware Detection with Machine Learning
    Buttyan, Levente
    Ferenc, Rudolf
    [J]. ERCIM NEWS, 2022, (129): : 17 - 19
  • [10] A Novel Malware Analysis Framework for Malware Detection and Classification using Machine Learning Approach
    Sethi, Kamalakanta
    Chaudhary, Shankar Kumar
    Tripathy, Bata Krishan
    Bera, Padmalochan
    [J]. ICDCN'18: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2018,