Generating natural adversarial examples with universal perturbations for text classification

被引:10
|
作者
Gao, Haoran [1 ,2 ]
Zhang, Hua [1 ]
Yang, Xingguo [1 ]
Li, Wenmin [1 ]
Gao, Fei [1 ]
Wen, Qiaoyan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] State Key Lab Cryptol, POB 5159, Beijing 100878, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; Adversarial examples; Universal perturbations; Text classification; ATTACKS;
D O I
10.1016/j.neucom.2021.10.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have demonstrated the vulnerability of text classifiers to universal adversarial attacks, which are splicing carefully designed word sequences into the original text. These word sequences are natural, and adversarial examples generated by splicing them with the original text are unnatural. In this paper, we propose a framework for generating natural adversarial examples with an adversarially regularized autoencoder (ARAE) model and an inverter model. The framework maps discrete text into the continuous space, get the conversion of adversarial examples by adding universal adversarial perturbations in the continuous space, then generates natural adversarial examples. In order to achieve universal adversarial attacks, we design a universal adversarial perturbations search (UAPS) algorithm with the gradient of the loss function of the target classifier. Perturbations found by the UAPS algorithm can be directly added to the conversion of the original text in the continuous space. On two textual entailment datasets, we evaluate the fooling rate of generated adversarial examples on two RNN-based architectures and one Transformer-based architecture. The results show that all architectures are vulnerable to the adversarial examples. For example, on the SNLI dataset, the accuracy of the ESIM model for the "entailment" category drops from 88.35% to 2.26%. While achieving a high fooling rate, generated adversarial examples have good performance in naturalness. By further analysis, adversarial examples generated in this paper have transferability in neural networks. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:175 / 182
页数:8
相关论文
共 50 条
  • [41] Generating Adversarial Examples With Shadow Model
    Zhang, Rui
    Xia, Hui
    Hu, Chunqiang
    Zhang, Cheng
    Liu, Chao
    Xiao, Fu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (09) : 6283 - 6289
  • [42] A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification
    Zhang, Lu
    Lambotharan, Sangarapillai
    Zheng, Gan
    Roli, Fabio
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [43] Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
    Ren, Shuhuai
    Deng, Yihe
    He, Kun
    Che, Wanxiang
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1085 - 1097
  • [44] Delving deep into adversarial perturbations initialization on adversarial examples generation
    Hu, Cong
    Wan, Peng
    Wu, Xiao-Jun
    Yin, He-Feng
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [45] Defense against Universal Adversarial Perturbations
    Akhtar, Naveed
    Liu, Jian
    Mian, Ajmal
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3389 - 3398
  • [46] Universal adversarial perturbations generative network
    Zheng Wang
    Yang Yang
    Jingjing Li
    Xiaofeng Zhu
    World Wide Web, 2022, 25 : 1725 - 1746
  • [47] Universal adversarial perturbations generative network
    Wang, Zheng
    Yang, Yang
    Li, Jingjing
    Zhu, Xiaofeng
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1725 - 1746
  • [48] Towards Universal Adversarial Examples and Defenses
    Rakin, Adnan Siraj
    Wang, Ye
    Aeron, Shuchin
    Koike-Akino, Toshiaki
    Moulin, Pierre
    Parsons, Kieran
    2021 IEEE INFORMATION THEORY WORKSHOP (ITW), 2021,
  • [49] Fair Classification with Adversarial Perturbations
    Celis, L. Elisa
    Mehrotra, Anay
    Vishnoi, Nisheeth K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [50] Generating Adversarial Examples With Distance Constrained Adversarial Imitation Networks
    Tang, Pengfei
    Wang, Wenjie
    Lou, Jian
    Xiong, Li
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (06) : 4145 - 4155