FCG-MFD: Benchmark function call graph-based dataset for malware family detection

被引:3
|
作者
Hadi, Hassan Jalil [1 ]
Cao, Yue [1 ]
Li, Sifan [1 ]
Ahmad, Naveed [2 ]
Alshara, Mohammed Ali [2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Malware detection; Malware family classification; Function Call Graph; Dataset;
D O I
10.1016/j.jnca.2024.104050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
引用
收藏
页数:15
相关论文
共 48 条
  • [21] Using G Features to Improve the Efficiency of Function Call Graph Based Android Malware Detection
    Yu Liu
    Liqiang Zhang
    Xiangdong Huang
    Wireless Personal Communications, 2018, 103 : 2947 - 2955
  • [22] A malware detection method based on family behavior graph
    Ding, Yuxin
    Xia, Xiaoling
    Chen, Sheng
    Li, Ye
    COMPUTERS & SECURITY, 2018, 73 : 73 - 86
  • [23] Graph-Based Android Malware Detection and Categorization through BERT Transformer
    Simoni, Marco
    Saracino, Andrea
    18TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY & SECURITY, ARES 2023, 2023,
  • [24] GMAD: Graph-based Malware Activity Detection by DNS traffic analysis
    Lee, Jehyun
    Lee, Heejo
    COMPUTER COMMUNICATIONS, 2014, 49 : 33 - 47
  • [25] Encrypted Malware Traffic Detection via Graph-based Network Analysis
    Fu, Zhuoqun
    Liu, Mingxuan
    Qin, Yue
    Zhang, Jia
    Zou, Yuan
    Yin, Qilei
    Li, Qi
    Duan, Haixin
    PROCEEDINGS OF 25TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2022, 2022, : 495 - 509
  • [26] Adversarial Learning Attacks on Graph-based IoT Malware Detection Systems
    Abusnaina, Ahmed
    Khormali, Aminollah
    Alasmary, Hisham
    Park, Jeman
    Anwar, Afsah
    Mohaisen, Aziz
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 1296 - 1305
  • [27] Enhancing android malware detection explainability through function call graph APIs
    Soi, Diego
    Sanna, Alessandro
    Maiorca, Davide
    Giacinto, Giorgio
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 80
  • [28] An Android Malware Detection Approach to Enhance Node Feature Differences in a Function Call Graph Based on GCNs
    Wu, Haojie
    Luktarhan, Nurbol
    Tian, Gaoqi
    Song, Yangyang
    SENSORS, 2023, 23 (10)
  • [29] Android Malware Detection Method Based on Function Call Graphs
    Ding, Yuxin
    Zhu, Siyi
    Xia, Xiaoling
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 70 - 77
  • [30] Z2F: Heterogeneous graph-based Android malware detection
    Ma, Ziwei
    Luktarhan, Nurbor
    PLOS ONE, 2024, 19 (03):