Imitation Attacks and Defenses for Black-box Machine Translation Systems

被引:0
|
作者
Wallace, Eric [1 ]
Stern, Mitchell [1 ]
Song, Dawn [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploitations of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. Using simulated experiments, we demonstrate that MT model stealing is possible even when imitation models have different input data or architectures than their target models. Applying these ideas, we train imitation models that reach within 0.6 BLEU of three production MT systems on both high-resource and low-resource language pairs. We then leverage the similarity of our imitation models to transfer adversarial examples to the production systems. We use gradient-based attacks that expose inputs which lead to semanticallyincorrect translations, dropped content, and vulgar model outputs. To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models. This defense degrades the adversary's BLEU score and attack success rate at some cost in the defender's BLEU and inference speed.
引用
收藏
页码:5531 / 5546
页数:16
相关论文
共 50 条
  • [1] Beating White-Box Defenses with Black-Box Attacks
    Kumova, Vera
    Pilat, Martin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [2] Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks
    Feng, Ryan
    Hooda, Ashish
    Mangaokar, Neal
    Fawaz, Kassem
    Jha, Somesh
    Prakash, Atul
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 786 - 800
  • [3] Practical Black-Box Attacks against Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Goodfellow, Ian
    Jha, Somesh
    Celik, Z. Berkay
    Swami, Ananthram
    PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 506 - 519
  • [4] Translate Your Gibberish: Black-Box Adversarial Attack on Machine Translation Systems
    A. Chertkov
    O. Tsymboi
    M. Pautov
    I. Oseledets
    Journal of Mathematical Sciences, 2024, 285 (2) : 221 - 233
  • [5] THE LITTLE BLACK-BOX FOR THE BODYS DEFENSES
    ADER, R
    PSYCHOLOGY TODAY, 1981, 15 (08) : 92 - 92
  • [6] Unsupervised Machine Translation Quality Estimation in Black-Box Setting
    Huang, Hui
    Di, Hui
    Xu, Jin'an
    Ouchi, Kazushige
    Chen, Yufeng
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 24 - 36
  • [7] Simple Black-box Adversarial Attacks
    Guo, Chuan
    Gardner, Jacob R.
    You, Yurong
    Wilson, Andrew Gordon
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [8] Understanding Pre-Editing for Black-Box Neural Machine Translation
    Miyata, Rei
    Fujita, Atsushi
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1539 - 1550
  • [9] Ensemble adversarial black-box attacks against deep learning systems
    Hang, Jie
    Han, Keji
    Chen, Hui
    Li, Yun
    PATTERN RECOGNITION, 2020, 101
  • [10] Black-Box Data Poisoning Attacks on Crowdsourcing
    Chen, Pengpeng
    Yang, Yongqiang
    Yang, Dingqi
    Sun, Hailong
    Chen, Zhijun
    Lin, Peng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983