FCG-MFD: Benchmark function call graph-based dataset for malware family detection

被引:3
|
作者
Hadi, Hassan Jalil [1 ]
Cao, Yue [1 ]
Li, Sifan [1 ]
Ahmad, Naveed [2 ]
Alshara, Mohammed Ali [2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Malware detection; Malware family classification; Function Call Graph; Dataset;
D O I
10.1016/j.jnca.2024.104050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
引用
收藏
页数:15
相关论文
共 48 条
  • [41] Predicting Proteins Functional Family: A Graph-Based Similarity Derived from Community Detection
    Mallek, Sabrine
    Boukhris, Imen
    Elouedi, Zied
    INTELLIGENT SYSTEMS'2014, VOL 2: TOOLS, ARCHITECTURES, SYSTEMS, APPLICATIONS, 2015, 323 : 629 - 639
  • [42] Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs
    Gunduz, Hakan
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [43] Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs
    Gunduz H.
    PeerJ Computer Science, 2022, 8
  • [44] Software Homology Detection With Software Motifs Based on Function-Call Graph
    Wu, Peng
    Wang, Junfeng
    Tian, Bin
    IEEE ACCESS, 2018, 6 : 19007 - 19017
  • [45] ASSESSING THE IMPACT OF THE EDGE-WEIGHTING FUNCTION IN A GRAPH-BASED APPROACH TO ANOMALY DETECTION
    Albano, James A.
    Ziemann, Amanda K.
    Messinger, David W.
    2013 5TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2013,
  • [46] A Graph-based Model for Malicious Software Detection Exploiting Domination Relations between System-call Groups
    Mpanti, Anna
    Nikolopoulos, Stavros D.
    Polenakis, Iosif
    COMPUTER SYSTEMS AND TECHNOLOGIES (COMPSYSTECH'18), 2018, 1641 : 20 - 26
  • [47] AdaTrans: An adaptive transformer for IoT Malware detection based on sensitive API call graph and inter-component communication analysis
    Pi, Feng
    Tian, Shengwei
    Pei, Xinjun
    Chen, Peng
    Wang, Xin
    Wang, Xiaowei
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 11439 - 11452
  • [48] Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs
    Khan, Irshad
    Kwon, Young-Woo
    INFORMATION SECURITY APPLICATIONS, WISA 2023, 2024, 14402 : 188 - 200