Evaluating Natural Language Inference Models: A Metamorphic Testing Approach

被引:2
|
作者
Jiang, Mingyue [1 ]
Bao, Houzhen [1 ]
Tu, Kaiyi [1 ]
Zhang, Xiao-Yi [2 ]
Ding, Zuohua [1 ]
机构
[1] Zhejiang Sci Tech Univ, Hangzhou, Peoples R China
[2] Natl Inst Informat, Tokyo, Japan
关键词
Natural Language Inference; Metamorphic Testing; Metamorphic Relation; Quality Evaluation; Oracle Problem;
D O I
10.1109/ISSRE52982.2021.00033
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Natural language inference (NLI) is a fundamental NLP task that forms the cornerstone of deep natural language understanding. Unfortunately, evaluation of NLI models is challenging. On one hand, due to the lack of test oracles, it is difficult to automatically judge the correctness of NLI's prediction results. On the other hand, apart from knowing how well a model performs, there is a further need for understanding the capabilities and characteristics of different NLI models. To mitigate these issues, we propose to apply the technique of metamorphic testing (MT) to NLI. We identify six categories of metamorphic relations, covering a wide range of properties that are expected to be possessed by NLI task. Based on this, MT can be conducted on NLI models without using test oracles, and MT results are able to interpret NLI models' capabilities from varying aspects. We further demonstrate the validity and effectiveness of our approach by conducting experiments on five NLI models. Our experiments expose a large number of prediction failures from subject NLI models, and also yield interpretations for common characteristics of NLI models.
引用
收藏
页码:220 / 230
页数:11
相关论文
共 50 条
  • [1] Evaluating Deep Learning Techniques for Natural Language Inference
    Eleftheriadis, Petros
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [2] XINFOTABS: Evaluating Multilingual Tabular Natural Language Inference
    Minhas, Bhavnick
    Shankhdhar, Anant
    Gupta, Vivek
    Aggrawal, Divyanshu
    Zhang, Shuo
    [J]. PROCEEDINGS OF THE FIFTH FACT EXTRACTION AND VERIFICATION WORKSHOP (FEVER 2022), 2022, : 59 - 77
  • [3] Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All Labels
    Anantaprayoon, Panatchakorn
    Kaneko, Masahiro
    Okazaki, Naoaki
    [J]. arXiv, 2023,
  • [4] Hybrid mutation driven testing for natural language inference
    Meng, Linghan
    Li, Yanhui
    Chen, Lin
    Ma, Mingliang
    Zhou, Yuming
    Xu, Baowen
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024,
  • [5] Evaluating BERT for natural language inference: A case study on the CommitmentBank
    Jiang, Nanjiang
    de Marneffe, Marie-Catherine
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6086 - 6091
  • [6] Building a Vietnamese Dataset for Natural Language Inference Models
    Nguyen C.T.
    Nguyen D.T.
    [J]. SN Computer Science, 3 (5)
  • [7] Evaluating Computational Language Models with Scaling Properties of Natural Language
    Takahashi, Shuntaro
    Tanaka-Ishii, Kumiko
    [J]. COMPUTATIONAL LINGUISTICS, 2019, 45 (03) : 481 - 514
  • [8] NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic
    Zheng, Zi'ou
    Zhu, Xiaodan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9960 - 9976
  • [9] Evaluation of Chinese Natural Language Processing System Based on Metamorphic Testing
    Jin, Lingzi
    Ding, Zuohua
    Zhou, Huihui
    [J]. MATHEMATICS, 2022, 10 (08)
  • [10] Gaussian Transformer: A Lightweight Approach for Natural Language Inference
    Guo, Maosheng
    Zhang, Yu
    Liu, Ting
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6489 - 6496