Generating natural adversarial examples with universal perturbations for text classification

被引:10
|
作者
Gao, Haoran [1 ,2 ]
Zhang, Hua [1 ]
Yang, Xingguo [1 ]
Li, Wenmin [1 ]
Gao, Fei [1 ]
Wen, Qiaoyan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] State Key Lab Cryptol, POB 5159, Beijing 100878, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; Adversarial examples; Universal perturbations; Text classification; ATTACKS;
D O I
10.1016/j.neucom.2021.10.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have demonstrated the vulnerability of text classifiers to universal adversarial attacks, which are splicing carefully designed word sequences into the original text. These word sequences are natural, and adversarial examples generated by splicing them with the original text are unnatural. In this paper, we propose a framework for generating natural adversarial examples with an adversarially regularized autoencoder (ARAE) model and an inverter model. The framework maps discrete text into the continuous space, get the conversion of adversarial examples by adding universal adversarial perturbations in the continuous space, then generates natural adversarial examples. In order to achieve universal adversarial attacks, we design a universal adversarial perturbations search (UAPS) algorithm with the gradient of the loss function of the target classifier. Perturbations found by the UAPS algorithm can be directly added to the conversion of the original text in the continuous space. On two textual entailment datasets, we evaluate the fooling rate of generated adversarial examples on two RNN-based architectures and one Transformer-based architecture. The results show that all architectures are vulnerable to the adversarial examples. For example, on the SNLI dataset, the accuracy of the ESIM model for the "entailment" category drops from 88.35% to 2.26%. While achieving a high fooling rate, generated adversarial examples have good performance in naturalness. By further analysis, adversarial examples generated in this paper have transferability in neural networks. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:175 / 182
页数:8
相关论文
共 50 条
  • [21] Comparative evaluation of recent universal adversarial perturbations in image classification
    Weng, Juanjuan
    Luo, Zhiming
    Lin, Dazhen
    Li, Shaozi
    Computers and Security, 2024, 136
  • [22] Comparative evaluation of recent universal adversarial perturbations in image classification
    Weng, Juanjuan
    Luo, Zhiming
    Lin, Dazhen
    Li, Shaozi
    COMPUTERS & SECURITY, 2024, 136
  • [23] Universal adversarial perturbations for multiple classification tasks with quantum classifiers
    Qiu, Yun-Zhong
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [24] Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification
    Zhou, Yichao
    Jiang, Jyun-Yu
    Chang, Kai-Wei
    Wang, Wei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4904 - 4913
  • [25] Detecting textual adversarial examples through text modification on text classification systems
    Hyun Kwon
    Sanghyun Lee
    Applied Intelligence, 2023, 53 : 19161 - 19185
  • [26] Detecting textual adversarial examples through text modification on text classification systems
    Kwon, Hyun
    Lee, Sanghyun
    APPLIED INTELLIGENCE, 2023, 53 (16) : 19161 - 19185
  • [27] HotFlip: White-Box Adversarial Examples for Text Classification
    Ebrahimi, Javid
    Rao, Anyi
    Lowd, Daniel
    Dou, Dejing
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 31 - 36
  • [28] WordRevert: Adversarial Examples Defence Method for Chinese Text Classification
    Xu, Enhui
    Zhang, Xiaolin
    Wang, Yongping
    Zhang, Shuai
    Lu, Lixin
    Xu, Li
    IEEE ACCESS, 2022, 10 : 28832 - 28841
  • [29] WordChange: Adversarial Examples Generation Approach for Chinese Text Classification
    Nuo, Cheng
    Chang, Guo-Qin
    Gao, Haichang
    Pei, Ge
    Zhang, Yang
    IEEE ACCESS, 2020, 8 (08): : 79561 - 79572
  • [30] BAE: BERT-based Adversarial Examples for Text Classification
    Garg, Siddhant
    Ramakrishnan, Goutham
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6174 - 6181