Evaluating the evaluation: A case study using the TREC 2002 question answering track

被引:0
|
作者
Voorhees, EM [1 ]
机构
[1] Natl Inst Stand & Technol, Gaithersburg, MD 20899 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating competing technologies on a common problem set is a powerful way to improve the state of the art and hasten technology transfer. Yet poorly designed evaluations can waste research effort or even mislead researchers with faulty conclusions. Thus it is important to examine the quality of a new evaluation task to establish its reliability. This paper provides an example of one such assessment by analyzing the task within the TREC 2002 question answering track. The analysis demonstrates that comparative results from the new task are stable, and empirically estimates the size of the difference required between scores to confidently conclude that two runs are different.
引用
收藏
页码:260 / 267
页数:8
相关论文
共 50 条
  • [1] Question Answering Track Evaluation in TREC, CLEF and NTCIR
    Olvera-Lobo, Maria-Dolores
    Gutierrez-Artacho, Juncal
    [J]. NEW CONTRIBUTIONS IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, PT 1, 2015, 353 : 13 - 22
  • [2] Question Answering Using Web Services: A Case Study in Tourism Packaging
    Wang, Liu
    Liao, Lejian
    [J]. LISS 2013, 2015, : 259 - 264
  • [3] Usability case study using TREC and ZPRISE
    Downey, Laura L.
    Tice, Dawn M.
    [J]. Information Processing and Management, 1999, 35 (05): : 589 - 603
  • [4] A usability case study using TREC and ZPRISE
    Downey, LL
    Tice, DM
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (05) : 589 - 603
  • [5] Dataset bias: A case study for visual question answering
    Das A.
    Anjum S.
    Gurari D.
    [J]. Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 58 - 67
  • [6] Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
    Agrawal, Aishwarya
    Kajic, Ivana
    Bugliarello, Emanuele
    Davoodi, Elnaz
    Gergely, Anita
    Blunsom, Phil
    Nematzadeh, Aida
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1201 - 1226
  • [7] A study about the future evaluation of Question-Answering systems
    Rodrigo, Alvaro
    Penas, Anselmo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 137 : 83 - 93
  • [8] Question Answering Over Knowledge Graphs: A Case Study in Tourism
    Aghaei, Sareh
    Raad, Elie
    Fensel, Anna
    [J]. IEEE ACCESS, 2022, 10 : 69788 - 69801
  • [9] A Case Study of Question Answering in Automatic Tourism Service Packaging
    Wang, Liu
    Liao, Lejian
    Yang, Kai
    Tan, Hai
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2013, 13 : 143 - 152
  • [10] Evaluating Reasoning in Factoid based Question Answering System by Using Machine Learning Approach
    Pundge, Ajitkumar Meshram
    Mahender, C. Namrata
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 821 - 825