Generating natural adversarial examples with universal perturbations for text classification

被引:10
|
作者
Gao, Haoran [1 ,2 ]
Zhang, Hua [1 ]
Yang, Xingguo [1 ]
Li, Wenmin [1 ]
Gao, Fei [1 ]
Wen, Qiaoyan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] State Key Lab Cryptol, POB 5159, Beijing 100878, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep neural network; Adversarial examples; Universal perturbations; Text classification; ATTACKS;
D O I
10.1016/j.neucom.2021.10.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have demonstrated the vulnerability of text classifiers to universal adversarial attacks, which are splicing carefully designed word sequences into the original text. These word sequences are natural, and adversarial examples generated by splicing them with the original text are unnatural. In this paper, we propose a framework for generating natural adversarial examples with an adversarially regularized autoencoder (ARAE) model and an inverter model. The framework maps discrete text into the continuous space, get the conversion of adversarial examples by adding universal adversarial perturbations in the continuous space, then generates natural adversarial examples. In order to achieve universal adversarial attacks, we design a universal adversarial perturbations search (UAPS) algorithm with the gradient of the loss function of the target classifier. Perturbations found by the UAPS algorithm can be directly added to the conversion of the original text in the continuous space. On two textual entailment datasets, we evaluate the fooling rate of generated adversarial examples on two RNN-based architectures and one Transformer-based architecture. The results show that all architectures are vulnerable to the adversarial examples. For example, on the SNLI dataset, the accuracy of the ESIM model for the "entailment" category drops from 88.35% to 2.26%. While achieving a high fooling rate, generated adversarial examples have good performance in naturalness. By further analysis, adversarial examples generated in this paper have transferability in neural networks. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:175 / 182
页数:8
相关论文
共 50 条
  • [1] AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text
    Shao, Zhihong
    Wu, Zhongqin
    Huang, Minlie
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1184 - 1196
  • [2] Universal Adversarial Attacks with Natural Triggers for Text Classification
    Song, Liwei
    Yu, Xinwei
    Peng, Hsuan-Tung
    Narasimhan, Karthik
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3724 - 3733
  • [3] Adversarial transformation network with adaptive perturbations for generating adversarial examples
    Zhang, Guoyin
    Da, Qingan
    Li, Sizhao
    Sun, Jianguo
    Wang, Wenshan
    Hu, Qing
    Lu, Jiashuai
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2022, 20 (02) : 94 - 103
  • [4] Universal adversarial examples and perturbations for quantum classifiers
    Weiyuan Gong
    Dong-Ling Deng
    National Science Review, 2022, 9 (06) : 48 - 55
  • [5] Universal adversarial examples and perturbations for quantum classifiers
    Gong, Weiyuan
    Deng, Dong-Ling
    NATIONAL SCIENCE REVIEW, 2022, 9 (06)
  • [6] Generating Universal Adversarial Perturbations for Quantum Classifiers
    Anil, Gautham
    Vinod, Vishnu
    Narayan, Apurva
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 10891 - 10899
  • [7] Generating Natural Language Adversarial Examples
    Alzantot, Moustafa
    Sharma, Yash
    Elgohary, Ahmed
    Ho, Bo-Jhang
    Srivastava, Mani B.
    Chang, Kai-Wei
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2890 - 2896
  • [8] TextJuggler: Fooling text classification tasks by generating high-quality adversarial examples
    Peng, Hao
    Wang, Zhe
    Wei, Chao
    Zhao, Dandan
    Xu, Guangquan
    Han, Jianming
    Guo, Shixin
    Zhong, Ming
    Ji, Shouling
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [9] Generating Transferable Adversarial Examples for Speech Classification
    Kim, Hoki
    Park, Jinseong
    Lee, Jaewook
    PATTERN RECOGNITION, 2023, 137
  • [10] Generating Fluent Adversarial Examples for Natural Languages
    Zhang, Huangzhao
    Zhou, Hao
    Miao, Ning
    Li, Lei
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5564 - 5569