FCG-MFD: Benchmark function call graph-based dataset for malware family detection

被引:3
|
作者
Hadi, Hassan Jalil [1 ]
Cao, Yue [1 ]
Li, Sifan [1 ]
Ahmad, Naveed [2 ]
Alshara, Mohammed Ali [2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Malware detection; Malware family classification; Function Call Graph; Dataset;
D O I
10.1016/j.jnca.2024.104050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
引用
收藏
页数:15
相关论文
共 48 条
  • [1] Scalable Function Call Graph-based Malware Classification
    Hassen, Mehadi
    Chan, Philip K.
    PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY'17), 2017, : 239 - 248
  • [2] Graph-based Malware Distributors Detection
    Venzhega, Andrei
    Zhinalieva, Polina
    Suboch, Nikolay
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 1141 - 1144
  • [3] SFCGDroid: android malware detection based on sensitive function call graph
    Shi, Sibo
    Tian, Shengwei
    Wang, Bo
    Zhou, Tiejun
    Chen, Guanxin
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2023, 22 (05) : 1115 - 1124
  • [4] Android Malware Detection Based on Structural Features of the Function Call Graph
    Yang, Yang
    Du, Xuehui
    Yang, Zhi
    Liu, Xing
    ELECTRONICS, 2021, 10 (02) : 1 - 18
  • [5] SFCGDroid: android malware detection based on sensitive function call graph
    Sibo Shi
    Shengwei Tian
    Bo Wang
    Tiejun Zhou
    Guanxin Chen
    International Journal of Information Security, 2023, 22 : 1115 - 1124
  • [6] A graph-based model for malware detection and classification using system-call groups
    Nikolopoulos S.D.
    Polenakis I.
    Journal of Computer Virology and Hacking Techniques, 2017, 13 (1) : 29 - 46
  • [7] Malware Detection and Classification Based on Graph Convolutional Networks and Function Call Graphs
    Chuang, Hsiang-Yu
    Chen, Jiann-Liang
    Ma, Yi-Wei
    IT PROFESSIONAL, 2023, 25 (03) : 43 - 53
  • [8] Graph-based malware detection using dynamic analysis
    Anderson, Blake
    Quist, Daniel
    Neil, Joshua
    Storlie, Curtis
    Lane, Terran
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2011, 7 (04): : 247 - 258
  • [9] Graph-Based Malware Detection Using Opcode Sequences
    Gulmez, Sibel
    Sogukpinar, Ibrahim
    9TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS'21), 2021,
  • [10] MalHAPGNN: An Enhanced Call Graph-Based Malware Detection Framework Using Hierarchical Attention Pooling Graph Neural Network
    Guo, Wenjie
    Du, Wenbiao
    Yang, Xiuqi
    Xue, Jingfeng
    Wang, Yong
    Han, Weijie
    Hu, Jingjing
    SENSORS, 2025, 25 (02)