A resampling-based method to evaluate NLI models

被引:0
|
作者
Salvatore, Felipe de Souza [1 ]
Finger, Marcelo [1 ]
Hirata Jr, Roberto [1 ]
Patriota, Alexandre G. [1 ]
机构
[1] Univ Sao Paulo, Inst Matemat & Estat, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
Textual entailment; Text classification; Statistical methods; Machine learning; Evaluation;
D O I
10.1017/S1351324923000268
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model's performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data.
引用
收藏
页码:793 / 820
页数:28
相关论文
共 50 条
  • [1] Resampling-based methods for biologists
    Fieberg, John R.
    Vitense, Kelsey
    Johnson, Douglas H.
    PEERJ, 2020, 8
  • [2] Resampling-based noise correction for crowdsourcing
    Xu, Wenqiang
    Jiang, Liangxiao
    Li, Chaoqun
    Journal of Experimental and Theoretical Artificial Intelligence, 2021, 33 (06): : 985 - 999
  • [3] Resampling-Based Change Point Estimation
    Fiosina, Jelena
    Fiosins, Maksims
    ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011, 2011, 7014 : 150 - 161
  • [4] Resampling-based noise correction for crowdsourcing
    Xu, Wenqiang
    Jiang, Liangxiao
    Li, Chaoqun
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2021, 33 (06) : 985 - 999
  • [5] Resampling-based selective clustering ensembles
    Hong, Yi
    Kwong, Sam
    Wang, Hanli
    Ren, Qjngsheng
    PATTERN RECOGNITION LETTERS, 2009, 30 (03) : 298 - 305
  • [6] A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data
    Liang, Faming
    Cheng, Yi Hen
    Song, Qifan
    Park, Jincheol
    Yang, Ping
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) : 325 - 339
  • [7] Extraction of mismatch negativity using a resampling-based spatial filtering method
    Lin, Yanfei
    Wu, Wei
    Wu, Chaohua
    Liu, Baolin
    Gao, Xiaorong
    JOURNAL OF NEURAL ENGINEERING, 2013, 10 (02)
  • [8] Consensus Clustering: A Resampling-based Method for Building Radiation Hybrid Maps
    Seetan, Raed I.
    Bible, Jacob
    Karavias, Michael
    Seitan, Wael
    Thangiah, Sam
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 240 - 245
  • [9] Resampling-based efficient shrinkage method for non-smooth minimands
    Xu, Jinfeng
    JOURNAL OF NONPARAMETRIC STATISTICS, 2013, 25 (03) : 731 - 743
  • [10] Resampling-based Variable Selection with Lasso for p >> n and Partially Linear Models
    Mares, Mihaela A.
    Guo, Yike
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 1076 - 1082