CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

被引：0

作者：

Jha, Akshita ^{[1
]}

Reddy, Chandan K. ^{[1
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Arlington, VA 22203 USA

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective blackbox attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.

引用

下载

页码：14892 / 14900

页数：9

共 50 条

[31] Compressing Pre-trained Models of Code into 3 MB
Shi, Jieke
Yang, Zhou
Xu, Bowen
Kang, Hong Jin
Lo, David
PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[32] An Empirical Comparison of Pre-Trained Models of Source Code
Niu, Changan
Li, Chuanyi
Ng, Vincent
Chen, Dongxiao
Ge, Jidong
Luo, Bin
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2136 - 2148
[33] What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code
Wan, Yao
Zhao, Wei
Zhang, Hongyu
Sui, Yulei
Xu, Guandong
Jin, Hai
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2377 - 2388
[34] What do pre-trained code models know about code?
Karmakar, Anjan
Robbes, Romain
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1332 - 1336
[35] A Study of Pre-trained Language Models in Natural Language Processing
Duan, Jiajia
Zhao, Hui
Zhou, Qian
Qiu, Meikang
Liu, Meiqin
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
[36] Diet Code Is Healthy: Simplifying Programs for Pre-trained Models of Code
Zhang, Zhaowei
Zhang, Hongyu
Shen, Beijun
Gu, Xiaodong
PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 1073 - 1084
[37] How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
Dong, Xinshuai
Luu Anh Tuan
Lin, Min
Yan, Shuicheng
Zhang, Hanwang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[38] Entity Resolution Based on Pre-trained Language Models with Two Attentions
Zhu, Liang
Liu, Hao
Song, Xin
Wei, Yonggang
Wang, Yu
WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 433 - 448
[39] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
Li J.
Ming C.
Guo Z.
Qian T.
Peng Z.
Wang X.
Li X.
Li J.
Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67
[40] A Brief Review of Relation Extraction Based on Pre-Trained Language Models
Xu, Tiange
Zhang, Fu
FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 775 - 789

← 1 2 3 4 5 →