PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis

被引:0
|
作者
Hossain, G. M. Sakhawat [1 ,2 ]
Deb, Kaushik [1 ]
Janicke, Helge [3 ,4 ]
Sarker, Iqbal H. [3 ,4 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Comp Sci & Engn, Chattogram 4349, Bangladesh
[2] Rangamati Sci & Technol Univ, Dept Comp Sci & Engn, Chattogram 4500, Bangladesh
[3] Cyber Secur Cooperat Res Ctr, Joondalup, WA 6027, Australia
[4] Edith Cowan Univ, Secur Res Inst, Sch Sci, Perth, WA 6027, Australia
关键词
Cybersecurity; PDF malware; data analytics; machine learning; decision rule; explainable AI; human interpretation; FILES; DOCUMENTS;
D O I
10.1109/ACCESS.2024.3357620
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Portable Document Format (PDF) is one of the most widely used file types, thus fraudsters insert harmful code into victims' PDF documents to compromise their equipment. Conventional solutions and identification techniques are often insufficient and may only partially prevent PDF malware because of their versatile character and excessive dependence on a certain typical feature set. The primary goal of this work is to detect PDF malware efficiently in order to alleviate the current difficulties. To accomplish the goal, we first develop a comprehensive dataset of 15958 PDF samples taking into account the non-malevolent, malicious, and evasive behaviors of the PDF samples. Using three well-known PDF analysis tools (PDFiD, PDFINFO, and PDF-PARSER), we extract significant characteristics from the PDF samples of our newly created dataset. In addition, we generate a number of derivations of features that have been experimentally proven to be helpful in classifying PDF malware. We develop a method to build an efficient and explicable feature set through the proper empirical analysis of the extracted and derived features. We explore different baseline machine learning classifiers and demonstrate an accuracy improvement of approx. 2% for the Random Forest classifier utilizing the selected feature set. Furthermore, we demonstrate the model's explainability by creating a decision tree that generates rules for human interpretation. Eventually, we make a comparison with previous studies and point out some important findings.
引用
收藏
页码:13833 / 13859
页数:27
相关论文
共 50 条
  • [41] Application of Machine Learning Algorithms for Android Malware Detection
    Kakavand, Mohsen
    Dabbagh, Mohammad
    Dehghantanha, Ali
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS (CIIS 2018), 2018, : 32 - 36
  • [42] Evaluation of machine learning classifiers for mobile malware detection
    Narudin, Fairuz Amalina
    Feizollah, Ali
    Anuar, Nor Badrul
    Gani, Abdullah
    [J]. SOFT COMPUTING, 2016, 20 (01) : 343 - 357
  • [43] Study on Machine Learning Techniques for Malware Classification and Detection
    Moon, Jaewoong
    Kim, Subin
    Song, Jaeseung
    Kim, Kyungshin
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (12): : 4308 - 4325
  • [44] Machine Learning Based Improved Malware Detection Schemes
    Priyadarshan, Pradosh
    Sarangi, Prateek
    Ratht, Adyasha
    Rath, Adyasha
    Panda, Ganapati
    [J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 925 - 931
  • [45] Advanced Machine Learning Based Malware Detection Systems
    Kim, Song-Kyoo
    Feng, Xiaomei
    Al Hamadi, Hussam
    Damiani, Ernesto
    Yeun, Chan Yeob
    Nandyala, Sivaprasad
    [J]. IEEE ACCESS, 2024, 12 : 115296 - 115305
  • [46] Evaluation of machine learning classifiers for mobile malware detection
    Fairuz Amalina Narudin
    Ali Feizollah
    Nor Badrul Anuar
    Abdullah Gani
    [J]. Soft Computing, 2016, 20 : 343 - 357
  • [47] Swarm Optimization and Machine Learning for Android Malware Detection
    Jhansi, K. Santosh
    Varma, P. Ravi Kiran
    Chakravarty, Sujata
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (03): : 6327 - 6345
  • [48] An Android Malware Detection System Based on Machine Learning
    Wen, Long
    Yu, Haiyang
    [J]. GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [49] Explainable Machine Learning for Malware Detection on Android Applications
    Palma, Catarina
    Ferreira, Artur
    Figueiredo, Mario
    [J]. INFORMATION, 2024, 15 (01)
  • [50] On the Robustness of Machine Learning Based Malware Detection Algorithms
    Hu, Weiwei
    Tan, Ying
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1435 - 1441