Evaluating the evaluation: A case study using the TREC 2002 question answering track

被引:0
|
作者
Voorhees, EM [1 ]
机构
[1] Natl Inst Stand & Technol, Gaithersburg, MD 20899 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating competing technologies on a common problem set is a powerful way to improve the state of the art and hasten technology transfer. Yet poorly designed evaluations can waste research effort or even mislead researchers with faulty conclusions. Thus it is important to examine the quality of a new evaluation task to establish its reliability. This paper provides an example of one such assessment by analyzing the task within the TREC 2002 question answering track. The analysis demonstrates that comparative results from the new task are stable, and empirically estimates the size of the difference required between scores to confidently conclude that two runs are different.
引用
下载
收藏
页码:260 / 267
页数:8
相关论文
共 50 条
  • [31] Belief Measure of Expertise for Experts Detection in Question Answering Communities: case study Stack Overflow
    Attiaoui, Dorra
    Martin, Arnaud
    Ben Yaghlane, Boutheina
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 622 - 631
  • [32] Evaluating community question-answering websites using interval-valued intuitionistic fuzzy DANP and TODIM methods
    Li, Ming
    Li, Ying
    Peng, Qijin
    Wang, Jie
    Yu, Chunxia
    APPLIED SOFT COMPUTING, 2021, 99
  • [33] Towards Automated Semantics-Driven Web Service Composition: Case Study on Question Answering Systems
    Perevalov, Aleksandr
    Both, Andreas
    Scherfner, Mike
    18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 41 - 48
  • [34] Evaluation of Railway Track Structural Conditions Using a Fuzzy-logic Method: A Case Study
    Rahmani, Asadollah
    Seyed-Hosseini, Seyed Mohammad
    JORDAN JOURNAL OF CIVIL ENGINEERING, 2021, 15 (01) : 102 - 115
  • [35] Integrating deep learning for visual question answering in Agricultural Disease Diagnostics: Case Study of Wheat Rust
    Akash Nanavaty
    Rishikesh Sharma
    Bhuman Pandita
    Ojasva Goyal
    Srinivas Rallapalli
    Murari Mandal
    Vaibhav Kumar Singh
    Pratik Narang
    Vinay Chamola
    Scientific Reports, 14 (1)
  • [36] Answering The Question Of Local Biomass Deployment: The Use Of Energy Modelling With Case Study For Non Industrial Customers
    Anglani, Norma
    Muliere, Giuseppe
    PRES'09: 12TH INTERNATIONAL CONFERENCE ON PROCESS INTEGRATION, MODELLING AND OPTIMISATION FOR ENERGY SAVING AND POLLUTION REDUCTION, PTS 1 AND 2, 2009, 18 : 659 - 664
  • [37] What Do Users Think of Promotional Gamification Schemes A Qualitative Case Study in a Question Answering Website
    Hadi Mogavi R.
    Zhang Y.
    Haq E.-U.
    Wu Y.
    Hui P.
    Ma X.
    Proceedings of the ACM on Human-Computer Interaction, 2022, 6
  • [38] Evaluation of the effect of tamping on the track geometry condition: A case study
    Soleimanmeigouni, Iman
    Ahmadi, Alireza
    Khouy, Iman Arasteh
    Letot, Christophe
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART F-JOURNAL OF RAIL AND RAPID TRANSIT, 2018, 232 (02) : 408 - 420
  • [39] Performance Evaluation of Different Similarity Functions and Classification Methods using Web Based Hindi Language Question Answering System
    Devi, Rajni
    Dua, Mohit
    2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, COMMUNICATION & CONVERGENCE, ICCC 2016, 2016, 92 : 520 - 525
  • [40] A study on the construction of an automatic legal question answering platform related to power system operation using artificial intelligence
    Jung S.-W.
    Kim L.-H.
    Transactions of the Korean Institute of Electrical Engineers, 2019, 68 (11): : 1450 - 1457