Evaluating Natural Language Inference Models: A Metamorphic Testing Approach

被引:2
|
作者
Jiang, Mingyue [1 ]
Bao, Houzhen [1 ]
Tu, Kaiyi [1 ]
Zhang, Xiao-Yi [2 ]
Ding, Zuohua [1 ]
机构
[1] Zhejiang Sci Tech Univ, Hangzhou, Peoples R China
[2] Natl Inst Informat, Tokyo, Japan
关键词
Natural Language Inference; Metamorphic Testing; Metamorphic Relation; Quality Evaluation; Oracle Problem;
D O I
10.1109/ISSRE52982.2021.00033
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Natural language inference (NLI) is a fundamental NLP task that forms the cornerstone of deep natural language understanding. Unfortunately, evaluation of NLI models is challenging. On one hand, due to the lack of test oracles, it is difficult to automatically judge the correctness of NLI's prediction results. On the other hand, apart from knowing how well a model performs, there is a further need for understanding the capabilities and characteristics of different NLI models. To mitigate these issues, we propose to apply the technique of metamorphic testing (MT) to NLI. We identify six categories of metamorphic relations, covering a wide range of properties that are expected to be possessed by NLI task. Based on this, MT can be conducted on NLI models without using test oracles, and MT results are able to interpret NLI models' capabilities from varying aspects. We further demonstrate the validity and effectiveness of our approach by conducting experiments on five NLI models. Our experiments expose a large number of prediction failures from subject NLI models, and also yield interpretations for common characteristics of NLI models.
引用
收藏
页码:220 / 230
页数:11
相关论文
共 50 条
  • [31] e-SNLI: Natural Language Inference with Natural Language Explanations
    Camburu, Oana-Maria
    Rocktaschel, Tim
    Lukasiewicz, Thomas
    Blunsom, Phil
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [32] Dynamic Inference of Likely Metamorphic Properties to Support Differential Testing
    Su, Fang-Hsiang
    Bell, Jonathan
    Murphy, Christian
    Kaiser, Gail
    [J]. 10TH INTERNATIONAL WORKSHOP ON AUTOMATION OF SOFTWARE TEST AST 2015, 2015, : 55 - 59
  • [33] A metamorphic testing approach for event sequences
    Chen, Jing
    Wang, Yinglong
    Guo, Ying
    Jiang, Mingyue
    [J]. PLOS ONE, 2019, 14 (02):
  • [34] Fake news detection on social media using a natural language inference approach
    Sadeghi, Fariba
    Bidgoly, Amir Jalaly
    Amirkhani, Hossein
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33801 - 33821
  • [35] Fake news detection on social media using a natural language inference approach
    Fariba Sadeghi
    Amir Jalaly Bidgoly
    Hossein Amirkhani
    [J]. Multimedia Tools and Applications, 2022, 81 : 33801 - 33821
  • [36] Explaining Simple Natural Language Inference
    Kalouli, Aikaterini-Lida
    Buis, Annebeth
    Real, Livy
    Palmer, Martha
    de Paiva, Valeria
    [J]. 13TH LINGUISTIC ANNOTATION WORKSHOP (LAW XIII), 2019, : 132 - 143
  • [37] INFERENCE AND COMPUTER UNDERSTANDING OF NATURAL LANGUAGE
    SCHANK, RC
    RIEGER, CJ
    [J]. ARTIFICIAL INTELLIGENCE, 1974, 5 (04) : 373 - 412
  • [38] Temporal Reasoning in Natural Language Inference
    Vashishtha, Siddharth
    Poliak, Adam
    Lal, Yash Kumar
    Van Durme, Benjamin
    White, Aaron Steven
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4070 - 4078
  • [39] Enhanced LSTM for Natural Language Inference
    Chen, Qian
    Zhu, Xiaodan
    Ling, Zhenhua
    Wei, Si
    Jiang, Hui
    Inkpen, Diana
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1657 - 1668
  • [40] Automated metamorphic testing on the analyses of feature models
    Segura, Sergio
    Hierons, Robert M.
    Benavides, David
    Ruiz-Cortes, Antonio
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2011, 53 (03) : 245 - 258