VulExplainer: A Transformer-Based Hierarchical Distillation for Explaining Vulnerability Types

被引:12
|
作者
Fu, Michael [1 ]
Nguyen, Van [1 ]
Tantithamthavorn, Chakkrit [1 ]
Le, Trung [1 ]
Phung, Dinh [1 ]
机构
[1] Monash Univ, Fac Informat Technol, Melbourne, Australia
关键词
Software vulnerability; software security; CLASSIFICATION;
D O I
10.1109/TSE.2023.3305244
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning-based vulnerability prediction approaches are proposed to help under-resourced security practitioners to detect vulnerable functions. However, security practitioners still do not know what type of vulnerabilities correspond to a given prediction (aka CWE-ID). Thus, a novel approach to explain the type of vulnerabilities for a given prediction is imperative. In this paper, we propose VulExplainer, an approach to explain the type of vulnerabilities. We represent VulExplainer as a vulnerability classification task. However, vulnerabilities have diverse characteristics (i.e., CWE-IDs) and the number of labeled samples in each CWE-ID is highly imbalanced (known as a highly imbalanced multi-class classification problem), which often lead to inaccurate predictions. Thus, we introduce a Transformer-based hierarchical distillation for software vulnerability classification in order to address the highly imbalanced types of software vulnerabilities. Specifically, we split a complex label distribution into sub-distributions based on CWE abstract types (i.e., categorizations that group similar CWE-IDs). Thus, similar CWE-IDs can be grouped and each group will have a more balanced label distribution. We learn TextCNN teachers on each of the simplified distributions respectively, however, they only perform well in their group. Thus, we build a transformer student model to generalize the performance of TextCNN teachers through our hierarchical knowledge distillation framework. Through an extensive evaluation using the real-world 8,636 vulnerabilities, our approach outperforms all of the baselines by 5%-29%. The results also demonstrate that our approach can be applied to Transformer-based architectures such as CodeBERT, GraphCodeBERT, and CodeGPT. Moreover, our method maintains compatibility with any Transformer-based model without requiring any architectural modifications but only adds a special distillation token to the input. These results highlight our significant contributions towards the fundamental and practical problem of explaining software vulnerability.
引用
收藏
页码:4550 / 4565
页数:16
相关论文
共 50 条
  • [41] TSD-CAM: transformer-based self distillation with CAM similarity for weakly supervised semantic segmentation
    Yan, Lingyu
    Chen, Jiangfeng
    Tang, Yuanyan
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [42] Transformer-Based Hierarchical Model for Non-Small Cell Lung Cancer Detection and Classification
    Imran, Muhammad
    Haq, Bushra
    Elbasi, Ersin
    Topcu, Ahmet E.
    Shao, Wei
    IEEE ACCESS, 2024, 12 : 145920 - 145933
  • [43] A Transformer-Based Hierarchical Variational AutoEncoder Combined Hidden Markov Model for Long Text Generation
    Zhao, Kun
    Ding, Hongwei
    Ye, Kai
    Cui, Xiaohui
    ENTROPY, 2021, 23 (10)
  • [44] Transformer-Based Approach to Melanoma Detection
    Cirrincione, Giansalvo
    Cannata, Sergio
    Cicceri, Giovanni
    Prinzi, Francesco
    Currieri, Tiziana
    Lovino, Marta
    Militello, Carmelo
    Pasero, Eros
    Vitabile, Salvatore
    SENSORS, 2023, 23 (12)
  • [45] Transformer-based Bug/Feature Classification
    Ozturk, Ceyhun E.
    Yilmaz, Eyup Halit
    Koksal, Omer
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [46] Enhancing Repetitive Action Counting Through Hierarchical Transformer-Based Radar-Vision Fusion
    Chen, Hao
    Cao, Zhongping
    Chen, Lin
    Wang, Guoli
    INTELLIGENT NETWORKED THINGS, CINT 2024, PT II, 2024, 2139 : 75 - 85
  • [47] Ferrofluid transformer-based tilt sensor
    Allison DeGraff
    Reza Rashidi
    Microsystem Technologies, 2020, 26 : 2499 - 2506
  • [48] Transformer-Based Visual Segmentation: A Survey
    Li, Xiangtai
    Ding, Henghui
    Yuan, Haobo
    Zhang, Wenwei
    Pang, Jiangmiao
    Cheng, Guangliang
    Chen, Kai
    Liu, Ziwei
    Loy, Chen Change
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10138 - 10163
  • [49] Transformer-based Arabic Dialect Identification
    Lin, Wanqiu
    Madhavi, Maulik
    Das, Rohan Kumar
    Li, Haizhou
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 192 - 196
  • [50] A Transformer-Based GAN for Anomaly Detection
    Yang, Caiyin
    Lan, Shiyong
    Huangl, Weikang
    Wang, Wenwu
    Liul, Guoliang
    Yang, Hongyu
    Ma, Wei
    Li, Piaoyang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 345 - 357