FCG-MFD: Benchmark function call graph-based dataset for malware family detection

被引：3

作者：

Hadi, Hassan Jalil ^{[1
]}

Cao, Yue ^{[1
]}

Li, Sifan ^{[1
]}

Ahmad, Naveed ^{[2
]}

Alshara, Mohammed Ali ^{[2
]}

机构：

[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China

[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia

来源：

JOURNAL OF NETWORK AND COMPUTER APPLICATIONS | 2025年 / 233卷

关键词：

Malware detection; Malware family classification; Function Call Graph; Dataset;

D O I：

10.1016/j.jnca.2024.104050

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.

引用

页数：15

共 48 条

[41] Predicting Proteins Functional Family: A Graph-Based Similarity Derived from Community Detection
Mallek, Sabrine
Boukhris, Imen
Elouedi, Zied
INTELLIGENT SYSTEMS'2014, VOL 2: TOOLS, ARCHITECTURES, SYSTEMS, APPLICATIONS, 2015, 323 : 629 - 639
[42] Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs
Gunduz, Hakan
PEERJ COMPUTER SCIENCE, 2022, 8
[43] Malware detection framework based on graph variational autoencoder extracted embeddings from API-call graphs
Gunduz H.
PeerJ Computer Science, 2022, 8
[44] Software Homology Detection With Software Motifs Based on Function-Call Graph
Wu, Peng
Wang, Junfeng
Tian, Bin
IEEE ACCESS, 2018, 6 : 19007 - 19017
[45] ASSESSING THE IMPACT OF THE EDGE-WEIGHTING FUNCTION IN A GRAPH-BASED APPROACH TO ANOMALY DETECTION
Albano, James A.
Ziemann, Amanda K.
Messinger, David W.
2013 5TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2013,
[46] A Graph-based Model for Malicious Software Detection Exploiting Domination Relations between System-call Groups
Mpanti, Anna
Nikolopoulos, Stavros D.
Polenakis, Iosif
COMPUTER SYSTEMS AND TECHNOLOGIES (COMPSYSTECH'18), 2018, 1641 : 20 - 26
[47] AdaTrans: An adaptive transformer for IoT Malware detection based on sensitive API call graph and inter-component communication analysis
Pi, Feng
Tian, Shengwei
Pei, Xinjun
Chen, Peng
Wang, Xin
Wang, Xiaowei
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 11439 - 11452
[48] Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs
Khan, Irshad
Kwon, Young-Woo
INFORMATION SECURITY APPLICATIONS, WISA 2023, 2024, 14402 : 188 - 200

← 1 2 3 4 5 →