Deep learning;
BERT;
Transformers;
Adversarial machine learning;
D O I:
10.1016/j.simpa.2022.100310
中图分类号:
TP31 [计算机软件];
学科分类号:
081202 ;
0835 ;
摘要:
Transformer models based on attention-based architectures have been significantly successful in establishing state-of-the-art results in natural language processing (NLP). However, recent work about adversarial robustness of attention-based models show that their robustness is susceptible to adversarial inputs causing spurious outputs thereby raising questions about trustworthiness of such models. In this paper, we present BERT Probe which is a python-based package for evaluating robustness to attention attribution based on character-level and word-level evasion attacks and empirically quantifying potential vulnerabilities for sequence classification tasks. Additionally, BERT Probe also provides two out-of-the-box defenses against character-level attention attribution-based evasion attacks.
机构:
ELKH Inst Comp Sci & Control SZTAK, Informat Lab, Budapest, Hungary
Budapest Univ Technol & Econ, Fac Elect Engn & Informat, Dept Automat & Appl Informat, Budapest, HungaryELKH Inst Comp Sci & Control SZTAK, Informat Lab, Budapest, Hungary
Acs, Judit
Hamerlik, Endre
论文数: 0引用数: 0
h-index: 0
机构:
ELKH Inst Comp Sci & Control SZTAK, Informat Lab, Budapest, Hungary
Comenius Univ, Fac Math Phys & Informat, Dept Appl Informat, Bratislava, SlovakiaELKH Inst Comp Sci & Control SZTAK, Informat Lab, Budapest, Hungary
机构:
Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA USA
Allen Inst Artificial Intelligence, Seattle, WA USAELKH Inst Comp Sci & Control SZTAK, Informat Lab, Budapest, Hungary
机构:
Univ Arizona, Appl Math GIDP, Tucson, AZ 85721 USA
Univ Arizona, Data Divers Lab, Tucson, AZ 85721 USAUniv Arizona, Appl Math GIDP, Tucson, AZ 85721 USA
机构:
Univ Ljublja, Fac Comp & Informat Sci, Vecna Pot 113, Ljubljana 1000, SloveniaUniv Ljublja, Fac Comp & Informat Sci, Vecna Pot 113, Ljubljana 1000, Slovenia