Testing Your Question Answering Software via Asking Recursively

被引:13
|
作者
Chen, Songqiang [1 ]
Jin, Shuo [1 ]
Xie, Xiaoyuan [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
question answering; testing and validation; recursive metamorphic testing; natural language processing;
D O I
10.1109/ASE51524.2021.9678670
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAASKER, with three novel Metamorphic Relations for testing QA software. QAASKER does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAsKER can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.
引用
收藏
页码:104 / 116
页数:13
相关论文
共 50 条
  • [1] QAASKER+: a novel testing method for question answering software via asking recursive questions
    Xie, Xiaoyuan
    Jin, Shuo
    Chen, Songqiang
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2023, 30 (01)
  • [2] PSYCHOLOGICAL-RESEARCH ON QUESTION ANSWERING AND QUESTION ASKING
    GRAESSER, AC
    [J]. DISCOURSE PROCESSES, 1990, 13 (03) : 259 - 260
  • [3] Agreement or prediction: Asking and answering the right question
    Tan, SB
    Wee, SB
    Cheung, YB
    [J]. ANNALS ACADEMY OF MEDICINE SINGAPORE, 2002, 31 (03) : 405 - 407
  • [4] ARE THE CASS STATISTICIANS ANSWERING A QUESTION NO CLINICIAN IS ASKING
    GUNNAR, RM
    LOEB, HS
    [J]. AMERICAN HEART JOURNAL, 1986, 111 (05) : 1016 - 1019
  • [5] Natural Test Generation for Precise Testing of Question Answering Software
    Shen, Qingchao
    Chen, Junjie
    Zhang, Jie M.
    Wang, Haoyu
    Liu, Shuang
    Tian, Menghan
    [J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [6] Answering your question about ...
    Jabri, Evelyn
    [J]. ACS CHEMICAL BIOLOGY, 2007, 2 (05) : 273 - 275
  • [7] Asking Clarification Questions in Knowledge-Based Question Answering
    Xu, Jingjing
    Wang, Yuechen
    Tang, Duyu
    Duan, Nan
    Yang, Pengcheng
    Zeng, Qi
    Zhou, Ming
    Sun, Xu
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1618 - 1629
  • [8] Testing the reasoning for question answering validation
    Penas, Anselmo
    Rodrigo, Alvaro
    Sama, Valentin
    Verdejo, Felisa
    [J]. JOURNAL OF LOGIC AND COMPUTATION, 2008, 18 (03) : 459 - 474
  • [9] Enhancing yes/no question answering with weak supervision via extractive question answering
    Dimitris Dimitriadis
    Grigorios Tsoumakas
    [J]. Applied Intelligence, 2023, 53 : 27560 - 27570
  • [10] Enhancing yes/no question answering with weak supervision via extractive question answering
    Dimitriadis, Dimitris
    Tsoumakas, Grigorios
    [J]. APPLIED INTELLIGENCE, 2023, 53 (22) : 27560 - 27570