A resampling-based method to evaluate NLI models

被引:0
|
作者
Salvatore, Felipe de Souza [1 ]
Finger, Marcelo [1 ]
Hirata Jr, Roberto [1 ]
Patriota, Alexandre G. [1 ]
机构
[1] Univ Sao Paulo, Inst Matemat & Estat, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
Textual entailment; Text classification; Statistical methods; Machine learning; Evaluation;
D O I
10.1017/S1351324923000268
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model's performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data.
引用
收藏
页码:793 / 820
页数:28
相关论文
共 50 条
  • [41] Resampling-based bias-corrected time series prediction
    Bandyopadhyay, S.
    Lahiri, S. N.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (12) : 3775 - 3788
  • [42] Resampling-Based Ensemble Methods for Online Class Imbalance Learning
    Wang, Shuo
    Minku, Leandro L.
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1356 - 1368
  • [43] Spatiotemporal Resampling-Based Real-Time Stochastic Lightcuts
    Liu H.
    Chen J.
    Zhang J.
    Zhang Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (05): : 760 - 768
  • [44] Adaptive resampling-based particle filtering for tool life prediction
    Wang, Peng
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SYSTEMS, 2015, 37 : 528 - 534
  • [45] Resampling-based empirical prediction: an application to small area estimation
    Lahiri, Soumendra N.
    Maiti, Tapabrata
    Katzoff, Myron
    Parsons, Van
    BIOMETRIKA, 2007, 94 (02) : 469 - 485
  • [46] Resampling-based information criteria for best-subset regression
    Philip T. Reiss
    Lei Huang
    Joseph E. Cavanaugh
    Amy Krain Roy
    Annals of the Institute of Statistical Mathematics, 2012, 64 : 1161 - 1186
  • [47] Resampling-Based Methodologies in Statistics of Extremes: Environmental and Financial Applications
    Ivette Gomes, M.
    Henriques-Rodrigues, Ligia
    Figueiredo, Fernanda
    MATHEMATICS OF ENERGY AND CLIMATE CHANGE, 2015, 2 : 163 - 181
  • [48] Resampling-Based Similarity Measures for High-Dimensional Data
    Amaratunga, Dhammika
    Cabrera, Javier
    Lee, Yung-Seop
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (01) : 54 - 62
  • [49] A Resampling-Based Markovian Model for Automated Colon Cancer Diagnosis
    Ozdemir, Erdem
    Sokmensuer, Cenk
    Gunduz-Demir, Cigdem
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2012, 59 (01) : 281 - 289
  • [50] Resampling-based information criteria for best-subset regression
    Reiss, Philip T.
    Huang, Lei
    Cavanaugh, Joseph E.
    Roy, Amy Krain
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2012, 64 (06) : 1161 - 1186