Testing Your Question Answering Software via Asking Recursively

被引:13
|
作者
Chen, Songqiang [1 ]
Jin, Shuo [1 ]
Xie, Xiaoyuan [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
question answering; testing and validation; recursive metamorphic testing; natural language processing;
D O I
10.1109/ASE51524.2021.9678670
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAASKER, with three novel Metamorphic Relations for testing QA software. QAASKER does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAsKER can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.
引用
收藏
页码:104 / 116
页数:13
相关论文
共 50 条
  • [21] Temporal Reasoning via Audio Question Answering
    Fayek, Haytham M.
    Johnson, Justin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2283 - 2294
  • [22] A Metamorphic Testing Approach for Assessing Question Answering Systems
    Tu, Kaiyi
    Jiang, Mingyue
    Ding, Zuohua
    [J]. MATHEMATICS, 2021, 9 (07)
  • [23] SQT: Debiased Visual Question Answering via Shuffling Question Types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Wang, Guoan
    Yu, Xinru
    Ma, Tianlong
    He, Liang
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 600 - 605
  • [24] Video question answering via traffic knowledge database and question classification
    Xiaoyong Sun
    Yu Dai
    Yuchen Wang
    Weifeng Ma
    Xuefen Lin
    [J]. Multimedia Systems, 2024, 30
  • [25] Video question answering via traffic knowledge database and question classification
    Sun, Xiaoyong
    Dai, Yu
    Wang, Yuchen
    Ma, Weifeng
    Lin, Xuefen
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [26] Scaling up Online Question Answering via Similar Question Retrieval
    Geigle, Chase
    Zhai, ChengXiang
    [J]. PROCEEDINGS OF THE THIRD (2016) ACM CONFERENCE ON LEARNING @ SCALE (L@S 2016), 2016, : 257 - 260
  • [27] BEAT: Considering question types for bug question answering via templates
    Lu, Jinting
    Sun, Xiaobing
    Li, Bin
    Bo, Lili
    Zhang, Tao
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 225
  • [28] Question Answering for the Operation of Software Applications: A Document Retrieval Approach
    Fujii, Atsushi
    Takegata, Seiji
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (06): : 1369 - 1377
  • [29] Defibrillation Testing at ICD Implantation: Are We Asking the Wrong Question?
    Gold, Michael R.
    Kroll, Mark W.
    Ellenbogen, Kenneth A.
    [J]. PACE-PACING AND CLINICAL ELECTROPHYSIOLOGY, 2009, 32 (05): : 567 - 569
  • [30] A Neural Question Answering System for Supporting Software Engineering Students
    Calijorne Soares, Marco Antonio
    Brandao, Wladmir Cardoso
    Parreiras, Fernando Silva
    [J]. 2018 XIII LATIN AMERICAN CONFERENCE ON LEARNING TECHNOLOGIES (LACLO 2018), 2019, : 201 - 207