A resampling-based method to evaluate NLI models

被引:0
|
作者
Salvatore, Felipe de Souza [1 ]
Finger, Marcelo [1 ]
Hirata Jr, Roberto [1 ]
Patriota, Alexandre G. [1 ]
机构
[1] Univ Sao Paulo, Inst Matemat & Estat, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
Textual entailment; Text classification; Statistical methods; Machine learning; Evaluation;
D O I
10.1017/S1351324923000268
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model's performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data.
引用
收藏
页码:793 / 820
页数:28
相关论文
共 50 条
  • [31] Resampling-based methods for the analysis of multiple endpoints in clinical trials
    Reitmeir, P
    Wassmer, G
    STATISTICS IN MEDICINE, 1999, 18 (24) : 3453 - 3462
  • [32] Assessing different uncertainty measures of EBLUP: a resampling-based approach
    Pereira, L. N.
    Coelho, P. S.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2010, 80 (07) : 713 - 727
  • [33] Efficient p-value evaluation for resampling-based tests
    Yu, Kai
    Liang, Faming
    Ciampa, Julia
    Chatterjee, Nilanjan
    BIOSTATISTICS, 2011, 12 (03) : 582 - 593
  • [34] A resampling-based cooperative localization algorithm in wireless sensor networks
    Xia, Nan
    Qiu, Tianshuang
    Li, Jingchun
    ICIC Express Letters, 2012, 6 (09): : 2363 - 2369
  • [35] SCOPE OF RESAMPLING-BASED TESTS IN fNIRS NEUROIMAGING DATA ANALYSIS
    Singh, Archana K.
    Clowney, Lester
    Okamoto, Masakc
    Cole, James B.
    Dan, Ippeita
    STATISTICA SINICA, 2008, 18 (04) : 1519 - 1534
  • [36] A resampling-based method for effort correction in abundance trend analyses from opportunistic biological records
    Zbinden, Niklaus
    Kery, Marc
    Haefliger, Guido
    Schmid, Hans
    Keller, Verena
    BIRD STUDY, 2014, 61 (04) : 506 - 517
  • [37] Resampling-based calculation of the information matrix for general identification problems
    Spall, JC
    PROCEEDINGS OF THE 1998 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 1998, : 3194 - 3198
  • [38] Detecting which variables alter component interpretation across multiple groups: A resampling-based method
    Sopiko Gvaladze
    Kim De Roover
    Francis Tuerlinckx
    Eva Ceulemans
    Behavior Research Methods, 2020, 52 : 236 - 263
  • [39] Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data
    Monti, S
    Tamayo, P
    Mesirov, J
    Golub, T
    MACHINE LEARNING, 2003, 52 (1-2) : 91 - 118
  • [40] Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data
    Patil, Abhijeet R.
    Kim, Sangjin
    MATHEMATICS, 2020, 8 (01)