Software Vulnerabilities Detection Based on a Pre-trained Language Model

被引：0

作者：

Xu, Wenlin ^{[1
]}

Li, Tong ^{[2
]}

Wang, Jinsong ^{[3
]}

Duan, Haibo ^{[3
]}

Tang, Yahui ^{[4
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Yunnan, Peoples R China

[2] Yunnan Agr Univ, Sch Big Data, Kunming, Yunnan, Peoples R China

[3] Yunnan Univ Finance & Econ, Informat Management Ctr, Kunming, Yunnan, Peoples R China

[4] Chongqing Univ Posts & Telecommun, Sch Software, Chongqing, Peoples R China

来源：

2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023 | 2024年

关键词：

Cyber security; Vulnerability detection; Pre-trained language model; Autoencoder; Outlier detection;

D O I：

10.1109/TrustCom60117.2023.00129

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Software vulnerabilities detection is crucial in cyber security which protects the software systems from malicious attacks. The majority of earlier techniques relied on security professionals to provide software features before training a classification or regression model on the features to find vulnerabilities. However, defining software features and collecting high-quality labeled vulnerabilities for training are both time consuming. To handle these issues, in this paper, we propose an unsupervised and effective method for extracting software features and detecting software vulnerabilities automatically. Firstly, we obtain software features and build a new pre-trained BERT model through constructing C/C++ vocabulary and pre-training on software source code. We then fine-tune the pre-trained BERT model with a deep autoencoder and create low-dimensional embedding from the software features. We finally apply a clustering-based outlier detection method on the embedding to detect vulnerabilities. We evaluate our method on five datasets with programs written in C/C++, experimental results show that our method outperforms state-of-the-art software vulnerability detection methods.

引用

页码：904 / 911

页数：8

共 50 条

[31] Grammatical Error Correction by Transferring Learning Based on Pre-Trained Language Model
Han M.
Wang Y.
Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2022, 56 (11): : 1554 - 1560
[32] NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model
Yang, Hao
Qin, Ying
Deng, Yao
Wang, Minghan
2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 185 - 189
[33] Pre-trained Language Model-based Retrieval and Ranking forWeb Search
Zou, Lixin
Lu, Weixue
Liu, Yiding
Cai, Hengyi
Chu, Xiaokai
Ma, Dehong
Shi, Daiting
Sun, Yu
Cheng, Zhicong
Gu, Simiu
Wang, Shuaiqiang
Yin, Dawei
ACM TRANSACTIONS ON THE WEB, 2023, 17 (01)
[34] Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering
Saberi, Iman
Fard, Fatemeh
Chen, Fuxiang
EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (04)
[35] SsciBERT: a pre-trained language model for social science texts
Si Shen
Jiangfeng Liu
Litao Lin
Ying Huang
Lin Zhang
Chang Liu
Yutong Feng
Dongbo Wang
Scientometrics, 2023, 128 : 1241 - 1263
[36] A Pre-trained Clinical Language Model for Acute Kidney Injury
Mao, Chengsheng
Yao, Liang
Luo, Yuan
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 531 - 532
[37] Few-Shot NLG with Pre-Trained Language Model
Chen, Zhiyu
Eavani, Harini
Chen, Wenhu
Liu, Yinyin
Wang, William Yang
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 183 - 190
[38] Knowledge Enhanced Pre-trained Language Model for Product Summarization
Yin, Wenbo
Ren, Junxiang
Wu, Yuejiao
Song, Ruilin
Liu, Lang
Cheng, Zhen
Wang, Sibo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
[39] Relational Prompt-Based Pre-Trained Language Models for Social Event Detection
Li, Pu
Yu, Xiaoyan
Peng, Hao
Xian, Yantuan
Wang, Linqin
Sun, Li
Zhang, Jingyun
Yu, Philip S.
ACM Transactions on Information Systems, 2024, 43 (01)
[40] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65

← 1 2 3 4 5 →