Evaluating Neural Model Robustness for Machine Comprehension

被引：0

作者：

Wu, Winston ^{[1
]}

Arendt, Dustin ^{[2
]}

Volkova, Svitlana ^{[3
]}

机构：

[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[2] Pacific Northwest Natl Lab, Visual Analyt Grp, Richland, WA USA

[3] Pacific Northwest Natl Lab, Data Sci & Analyt Grp, Richland, WA USA

来源：

16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We evaluate neural model robustness to adversarial attacks using different types of linguistic unit perturbations - character and word, and propose a new method for strategic sentencelevel perturbations. We experiment with different amounts of perturbations to examine model confidence and misclassification rate, and contrast model performance with different embeddings BERT and ELMo on two benchmark datasets SQuAD and TriviaQA. We demonstrate how to improve model performance during an adversarial attack by using ensembles. Finally, we analyze factors that affect model behavior under adversarial attack, and develop a new model to predict errors during attacks. Our novel findings reveal that (a) unlike BERT, models that use ELMo embeddings are more susceptible to adversarial attacks, (b) unlike word and paraphrase, character perturbations affect the model the most but are most easily compensated for by adversarial training, (c) word perturbations lead to more high-confidence misclassifications compared to sentence- and character-level perturbations, (d) the type of question and model answer length (the longer the answer the more likely it is to be incorrect) is the most predictive of model errors in adversarial setting, and (e) conclusions about model behavior are dataset-specific.

引用

页码：2470 / 2481

页数：12

共 50 条

[41] Evaluating project robustness through the lens of the business model
Regin, Justin M.
PICMET '07: PORTLAND INTERNATIONAL CENTER FOR MANAGEMENT OF ENGINEERING AND TECHNOLOGY, VOLS 1-6, PROCEEDINGS: MANAGEMENT OF CONVERGING TECHNOLOGIES, 2007, : 2043 - 2048
[42] NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
Chen, Simin
Song, Zihe
Haque, Mirazul
Liu, Cong
Yang, Wei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15344 - 15353
[43] Evaluating Neural Network Robustness for Melanoma Classification using Mutual Information
O'Brien, Molly
Bukowski, Julia
Hager, Greg
Pezeshk, Aria
Unberath, Mathias
MEDICAL IMAGING 2022: IMAGE PROCESSING, 2022, 12032
[44] Evaluating Robustness to Noise and Compression of Deep Neural Networks for Keyword Spotting
Pereira, Pedro H.
Beccaro, Wesley
Ramirez, Miguel A.
IEEE ACCESS, 2023, 11 : 53224 - 53236
[45] Machine Reading Comprehension Model for Chinese Word Segmentation
zhou Y.
Chen Y.
Huang R.
Qin Y.
Lin C.
Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2022, 56 (08): : 95 - 103
[46] Building machine reading comprehension model from scratch
Yang, Zijian Gyozo
Ligeti-Nagy, Noemi
ANNALES MATHEMATICAE ET INFORMATICAE, 2023, 57 : 107 - 123
[47] DIM Reader: Dual Interaction Model for Machine Comprehension
Liu, Zhuang
Huang, Degen
Huang, Kaiyu
Zhang, Jing
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 387 - 397
[48] Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model
Li, Ren
Mo, Tianjin
Yang, Jianxi
Li, Dong
Jiang, Shixin
Wang, Di
ADVANCED ENGINEERING INFORMATICS, 2021, 50
[49] Chinese machine reading comprehension based on deep learning neural network
Ma, Chao
An, Jing
Xu, Jing
Xu, Binchen
Xu, Luyuan
Bai, Xiang-En
INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2023, 21 (03) : 137 - 147
[50] Towards Robust Neural Machine Reading Comprehension via Question Paraphrases
Li, Ying
Li, Hongyu
Liu, Jing
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 290 - 295

← 1 2 3 4 5 →