Evaluating the evaluation: A case study using the TREC 2002 question answering track

被引：0

作者：

Voorhees, EM ^{[1
]}

机构：

[1] Natl Inst Stand & Technol, Gaithersburg, MD 20899 USA

来源：

HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE | 2003年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluating competing technologies on a common problem set is a powerful way to improve the state of the art and hasten technology transfer. Yet poorly designed evaluations can waste research effort or even mislead researchers with faulty conclusions. Thus it is important to examine the quality of a new evaluation task to establish its reliability. This paper provides an example of one such assessment by analyzing the task within the TREC 2002 question answering track. The analysis demonstrates that comparative results from the new task are stable, and empirically estimates the size of the difference required between scores to confidently conclude that two runs are different.

引用

页码：260 / 267

页数：8

共 50 条

[1] Question Answering Track Evaluation in TREC, CLEF and NTCIR
Olvera-Lobo, Maria-Dolores
Gutierrez-Artacho, Juncal
[J]. NEW CONTRIBUTIONS IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, PT 1, 2015, 353 : 13 - 22
[2] Question Answering Using Web Services: A Case Study in Tourism Packaging
Wang, Liu
Liao, Lejian
[J]. LISS 2013, 2015, : 259 - 264
[3] Usability case study using TREC and ZPRISE
Downey, Laura L.
Tice, Dawn M.
[J]. Information Processing and Management, 1999, 35 (05): : 589 - 603
[4] A usability case study using TREC and ZPRISE
Downey, LL
Tice, DM
[J]. INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (05) : 589 - 603
[5] Dataset bias: A case study for visual question answering
Das A.
Anjum S.
Gurari D.
[J]. Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 58 - 67
[6] Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Agrawal, Aishwarya
Kajic, Ivana
Bugliarello, Emanuele
Davoodi, Elnaz
Gergely, Anita
Blunsom, Phil
Nematzadeh, Aida
[J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1201 - 1226
[7] A study about the future evaluation of Question-Answering systems
Rodrigo, Alvaro
Penas, Anselmo
[J]. KNOWLEDGE-BASED SYSTEMS, 2017, 137 : 83 - 93
[8] Question Answering Over Knowledge Graphs: A Case Study in Tourism
Aghaei, Sareh
Raad, Elie
Fensel, Anna
[J]. IEEE ACCESS, 2022, 10 : 69788 - 69801
[9] A Case Study of Question Answering in Automatic Tourism Service Packaging
Wang, Liu
Liao, Lejian
Yang, Kai
Tan, Hai
[J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2013, 13 : 143 - 152
[10] Evaluating Reasoning in Factoid based Question Answering System by Using Machine Learning Approach
Pundge, Ajitkumar Meshram
Mahender, C. Namrata
[J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 821 - 825

← 1 2 3 4 5 →