FCG-MFD: Benchmark function call graph-based dataset for malware family detection

被引:3
|
作者
Hadi, Hassan Jalil [1 ]
Cao, Yue [1 ]
Li, Sifan [1 ]
Ahmad, Naveed [2 ]
Alshara, Mohammed Ali [2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Prince Sultan Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Malware detection; Malware family classification; Function Call Graph; Dataset;
D O I
10.1016/j.jnca.2024.104050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cyber crimes related to malware families are on the rise. This growth persists despite the prevalence of various antivirus software and approaches for malware detection and classification. Security experts have implemented Machine Learning (ML) techniques to identify these cyber-crimes. However, these approaches demand updated malware datasets for continuous improvements amid the evolving sophistication of malware strains. Thus, we present the FCG-MFD, a benchmark dataset with extensive Function Call Graphs (FCG) for malware family detection. This dataset guarantees resistance against emerging malware families by enabling security systems. Our dataset has two sub-datasets (FCG & Metadata) (1,00,000 samples) from VirusSamples, Virusshare, VirusSign, theZoo, Vx-underground, and MalwareBazaar curated using FCGs and metadata to optimize the efficacy of ML algorithms. We suggest a new malware analysis technique using FCGs and graph embedding networks, offering a solution to the complexity of feature engineering in ML-based malware analysis. Our approach to extracting semantic features via the Natural Language Processing (NLP) method is inspired by tasks involving sentences and words, respectively, for functions and instructions. We leverage a node2vec mechanism-based graph embedding network to generate malware embedding vectors. These vectors enable automated and efficient malware analysis by combining structural and semantic features. We use two datasets (FCG & Metadata) to assess FCG-MFD performance. F1-Scores of 99.14% and 99.28% are competitive with State-of-the-art (SOTA) methods.
引用
收藏
页数:15
相关论文
共 48 条
  • [31] DeepCatra: Learning flow- and graph-based behaviours for Android malware detection
    Wu, Yafei
    Shi, Jian
    Wang, Peicheng
    Zeng, Dongrui
    Sun, Cong
    IET INFORMATION SECURITY, 2023, 17 (01) : 118 - 130
  • [32] Analysis of Android malware family characteristic based on isomorphism of sensitive API call graph
    Zhou, Hao
    Zhang, Wei
    Wei, Fengqiong
    Chen, Yunfang
    2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 319 - 327
  • [33] Subgraph-Based Adversarial Examples Against Graph-Based IoT Malware Detection Systems
    Abusnaina, Ahmed
    Alasmary, Hisham
    Abuhamad, Mohammed
    Salem, Saeed
    Nyang, DaeHun
    Mohaisen, Aziz
    COMPUTATIONAL DATA AND SOCIAL NETWORKS, 2019, 11917 : 268 - 281
  • [34] POSTER: Breaking Graph-based IoT Malware Detection Systems Using Adversarial Examples
    Abusnaina, Ahmed
    Khormali, Aminollah
    Alasmary, Hisham
    Park, Jeman
    Anwar, Afsah
    Meteriz, Ulku
    Mohaisen, Aziz
    PROCEEDINGS OF THE 2019 CONFERENCE ON SECURITY AND PRIVACY IN WIRELESS AND MOBILE NETWORKS (WISEC '19), 2019, : 290 - 291
  • [35] A Graph-Based Feature Generation Approach in Android Malware Detection with Machine Learning Techniques
    Liu, Xiaojian
    Lei, Qian
    Liu, Kehong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [36] HGDetector: A hybrid Android malware detection method using network traffic and Function call graph
    Feng, Jiayin
    Shen, Limin
    Chen, Zhen
    Lei, Yu
    Li, Hui
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 114 : 30 - 45
  • [37] GSDM: Graph-based Scaling Detection Model in Network Function Virtualization
    Li, Lishan
    Liu, Ying
    Wu, Jianping
    Ren, Gang
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [38] GSDM: Graph-based scaling detection model in network function virtualization
    Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China
    不详
    Proc. - IEEE Glob. Commun. Conf., GLOBECOM, 2019,
  • [39] Opcode-level function call graph based android malware classification using deep learning
    Niu, Weina
    Cao, Rong
    Zhang, Xiaosong
    Ding, Kangyi
    Zhang, Kaimeng
    Li, Ting
    Sensors (Switzerland), 2020, 20 (13): : 1 - 23
  • [40] OpCode-Level Function Call Graph Based Android Malware Classification Using Deep Learning
    Niu, Weina
    Cao, Rong
    Zhang, Xiaosong
    Ding, Kangyi
    Zhang, Kaimeng
    Li, Ting
    SENSORS, 2020, 20 (13) : 1 - 23