Imitation Attacks and Defenses for Black-box Machine Translation Systems

被引：0

作者：

Wallace, Eric ^{[1
]}

Stern, Mitchell ^{[1
]}

Song, Dawn ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploitations of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. Using simulated experiments, we demonstrate that MT model stealing is possible even when imitation models have different input data or architectures than their target models. Applying these ideas, we train imitation models that reach within 0.6 BLEU of three production MT systems on both high-resource and low-resource language pairs. We then leverage the similarity of our imitation models to transfer adversarial examples to the production systems. We use gradient-based attacks that expose inputs which lead to semanticallyincorrect translations, dropped content, and vulgar model outputs. To mitigate these vulnerabilities, we propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models. This defense degrades the adversary's BLEU score and attack success rate at some cost in the defender's BLEU and inference speed.

引用

页码：5531 / 5546

页数：16

共 50 条

[31] A review of black-box adversarial attacks on image classification
Zhu, Yanfei
Zhao, Yaochi
Hu, Zhuhua
Luo, Tan
He, Like
NEUROCOMPUTING, 2024, 610
[32] Boosting Black-Box Adversarial Attacks with Meta Learning
Fu, Junjie
Sun, Jian
Wang, Gang
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7308 - 7313
[33] Impossibility of Black-Box Simulation Against Leakage Attacks
Ostrovsky, Rafail
Persiano, Giuseppe
Visconti, Ivan
ADVANCES IN CRYPTOLOGY, PT II, 2015, 9216 : 130 - 149
[34] Reverse Attack: Black-box Attacks on Collaborative Recommendation
Zhang, Yihe
Yuan, Xu
Li, Jin
Lou, Jiadong
Chen, Li
Tzeng, Nian-Feng
CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 51 - 68
[35] Curls & Whey: Boosting Black-Box Adversarial Attacks
Shi, Yucheng
Wang, Siyu
Han, Yahong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6512 - 6520
[36] Boundary Defense Against Black-box Adversarial Attacks
Aithal, Manjushree B.
Li, Xiaohua
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2349 - 2356
[37] Knowledge-enhanced Black-box Attacks for Recommendations
Chen, Jingfan
Fan, Wenqi
Zhu, Guanghui
Zhao, Xiangyu
Yuan, Chunfeng
Li, Qing
Huang, Yihua
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 108 - 117
[38] Black-box Adversarial Attacks with Limited Queries and Information
Ilyas, Andrew
Engstrom, Logan
Athalye, Anish
Lin, Jessy
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[39] Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Mehrotra, Anay
Zampetakis, Manolis
Kassianik, Paul
Nelson, Blaine
Anderson, Hyrum
Singer, Yaron
Karbasi, Amin
arXiv, 2023,
[40] Black-box adversarial attacks by manipulating image attributes
Wei, Xingxing
Guo, Ying
Li, Bo
Information Sciences, 2021, 550 : 285 - 296

← 1 2 3 4 5 →