CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [31] A clustering-based method for fuzzy modeling
    Wong, CC
    Chen, CC
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (06) : 1058 - 1065
  • [32] Clustering-based method for fuzzy modeling
    Tamkang Univ, Taipei Hsien, Taiwan
    IEICE Trans Inf Syst, 6 (1058-1065):
  • [33] A Clustering-Based Evidence Reasoning Method
    Li, Xinde
    Wang, Fengyu
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2016, 31 (07) : 698 - 721
  • [34] A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
    Wanlan Wang
    Kian-Kai Cheng
    Lingli Deng
    Jingjing Xu
    Guiping Shen
    Julian L. Griffin
    Jiyang Dong
    Metabolomics, 2017, 13
  • [35] A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data
    Wang, Wanlan
    Cheng, Kian-Kai
    Deng, Lingli
    Xu, Jingjing
    Shen, Guiping
    Griffin, Julian L.
    Dong, Jiyang
    METABOLOMICS, 2017, 13 (01)
  • [36] A Clustering-Based Privacy-Preserving Method for Uncertain Trajectory Data
    Cai, Zhou-Fu
    Yang, He-Xing
    Shuang, Wang
    Jian, Xu
    Wei, Wang-Ming
    Na, Wu-Li
    2014 IEEE 13TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM), 2014, : 1 - 8
  • [37] An efficient clustering-based method for data gathering and compressing in sensor networks
    Ren, Qianqian
    Li, Jianzhong
    Li, Jinbao
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 823 - +
  • [38] Clustering-based KPI Data Association Analysis Method in Cellular Networks
    Guo, Xingyu
    Yu, Peng
    Li, Wenjing
    Qiu, Xuesong
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 1101 - 1104
  • [39] A clustering-based strategy for automated structural modal identification
    Cardoso, Rhara de Almeida
    Cury, Alexandre
    Barbosa, Flavio
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2018, 17 (02): : 201 - 217
  • [40] OpenK: An Elastic Data Cleansing System with A Clustering-based Data Anomaly Detection Approach
    Tran Khanh Dang
    Dinh Khuong Nguyen
    Luc Minh Tuan
    2021 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP 2021), 2021, : 120 - 127