Testing Your Question Answering Software via Asking Recursively

被引：13

作者：

Chen, Songqiang ^{[1
]}

Jin, Shuo ^{[1
]}

Xie, Xiaoyuan ^{[1
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China

来源：

2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021 | 2021年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

question answering; testing and validation; recursive metamorphic testing; natural language processing;

D O I：

10.1109/ASE51524.2021.9678670

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Question Answering (QA) is an attractive and challenging area in NLP community. There are diverse algorithms being proposed and various benchmark datasets with different topics and task formats being constructed. QA software has also been widely used in daily human life now. However, current QA software is mainly tested in a reference-based paradigm, in which the expected outputs (labels) of test cases need to be annotated with much human effort before testing. As a result, neither the just-in-time test during usage nor the extensible test on massive unlabeled real-life data is feasible, which keeps the current testing of QA software from being flexible and sufficient. In this paper, we propose a method, QAASKER, with three novel Metamorphic Relations for testing QA software. QAASKER does not require the annotated labels but tests QA software by checking its behaviors on multiple recursively asked questions that are related to the same knowledge. Experimental results show that QAAsKER can reveal violations at over 80% of valid cases without using any pre-annotated labels. Diverse answering issues, especially the limited generalization on question types across datasets, are revealed on a state-of-the-art QA algorithm.

引用

页码：104 / 116

页数：13

共 50 条

[1] QAASKER+: a novel testing method for question answering software via asking recursive questions
Xie, Xiaoyuan
Jin, Shuo
Chen, Songqiang
[J]. AUTOMATED SOFTWARE ENGINEERING, 2023, 30 (01)
[2] PSYCHOLOGICAL-RESEARCH ON QUESTION ANSWERING AND QUESTION ASKING
GRAESSER, AC
[J]. DISCOURSE PROCESSES, 1990, 13 (03) : 259 - 260
[3] Agreement or prediction: Asking and answering the right question
Tan, SB
Wee, SB
Cheung, YB
[J]. ANNALS ACADEMY OF MEDICINE SINGAPORE, 2002, 31 (03) : 405 - 407
[4] ARE THE CASS STATISTICIANS ANSWERING A QUESTION NO CLINICIAN IS ASKING
GUNNAR, RM
LOEB, HS
[J]. AMERICAN HEART JOURNAL, 1986, 111 (05) : 1016 - 1019
[5] Natural Test Generation for Precise Testing of Question Answering Software
Shen, Qingchao
Chen, Junjie
Zhang, Jie M.
Wang, Haoyu
Liu, Shuang
Tian, Menghan
[J]. PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[6] Answering your question about ...
Jabri, Evelyn
[J]. ACS CHEMICAL BIOLOGY, 2007, 2 (05) : 273 - 275
[7] Asking Clarification Questions in Knowledge-Based Question Answering
Xu, Jingjing
Wang, Yuechen
Tang, Duyu
Duan, Nan
Yang, Pengcheng
Zeng, Qi
Zhou, Ming
Sun, Xu
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1618 - 1629
[8] Testing the reasoning for question answering validation
Penas, Anselmo
Rodrigo, Alvaro
Sama, Valentin
Verdejo, Felisa
[J]. JOURNAL OF LOGIC AND COMPUTATION, 2008, 18 (03) : 459 - 474
[9] Enhancing yes/no question answering with weak supervision via extractive question answering
Dimitris Dimitriadis
Grigorios Tsoumakas
[J]. Applied Intelligence, 2023, 53 : 27560 - 27570
[10] Enhancing yes/no question answering with weak supervision via extractive question answering
Dimitriadis, Dimitris
Tsoumakas, Grigorios
[J]. APPLIED INTELLIGENCE, 2023, 53 (22) : 27560 - 27570

← 1 2 3 4 5 →