Imitation Attacks and Defenses for Black-box Machine Translation Systems

被引:0
|
作者
Wallace, Eric [1 ]
Stern, Mitchell [1 ]
Song, Dawn [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploitations of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. Using simulated experiments, we demonstrate that MT model stealing is possible even when imitation models have different input data or architectures than their target models. Applying these ideas, we train imitation models that reach within 0.6 BLEU of three production MT systems on both high-resource and low-resource language pairs. We then leverage the similarity of our imitation models to transfer adversarial examples to the production systems. We use gradient-based attacks that expose inputs which lead to semanticallyincorrect translations, dropped content, and vulgar model outputs. To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models. This defense degrades the adversary's BLEU score and attack success rate at some cost in the defender's BLEU and inference speed.
引用
收藏
页码:5531 / 5546
页数:16
相关论文
共 50 条
  • [31] A review of black-box adversarial attacks on image classification
    Zhu, Yanfei
    Zhao, Yaochi
    Hu, Zhuhua
    Luo, Tan
    He, Like
    NEUROCOMPUTING, 2024, 610
  • [32] Boosting Black-Box Adversarial Attacks with Meta Learning
    Fu, Junjie
    Sun, Jian
    Wang, Gang
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7308 - 7313
  • [33] Impossibility of Black-Box Simulation Against Leakage Attacks
    Ostrovsky, Rafail
    Persiano, Giuseppe
    Visconti, Ivan
    ADVANCES IN CRYPTOLOGY, PT II, 2015, 9216 : 130 - 149
  • [34] Reverse Attack: Black-box Attacks on Collaborative Recommendation
    Zhang, Yihe
    Yuan, Xu
    Li, Jin
    Lou, Jiadong
    Chen, Li
    Tzeng, Nian-Feng
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 51 - 68
  • [35] Curls & Whey: Boosting Black-Box Adversarial Attacks
    Shi, Yucheng
    Wang, Siyu
    Han, Yahong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6512 - 6520
  • [36] Boundary Defense Against Black-box Adversarial Attacks
    Aithal, Manjushree B.
    Li, Xiaohua
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2349 - 2356
  • [37] Knowledge-enhanced Black-box Attacks for Recommendations
    Chen, Jingfan
    Fan, Wenqi
    Zhu, Guanghui
    Zhao, Xiangyu
    Yuan, Chunfeng
    Li, Qing
    Huang, Yihua
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 108 - 117
  • [38] Black-box Adversarial Attacks with Limited Queries and Information
    Ilyas, Andrew
    Engstrom, Logan
    Athalye, Anish
    Lin, Jessy
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [39] Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
    Mehrotra, Anay
    Zampetakis, Manolis
    Kassianik, Paul
    Nelson, Blaine
    Anderson, Hyrum
    Singer, Yaron
    Karbasi, Amin
    arXiv, 2023,
  • [40] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    Information Sciences, 2021, 550 : 285 - 296