Evaluating the evaluation: A case study using the TREC 2002 question answering track

被引：0

作者：

Voorhees, EM ^{[1
]}

机构：

[1] Natl Inst Stand & Technol, Gaithersburg, MD 20899 USA

来源：

HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE | 2003年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluating competing technologies on a common problem set is a powerful way to improve the state of the art and hasten technology transfer. Yet poorly designed evaluations can waste research effort or even mislead researchers with faulty conclusions. Thus it is important to examine the quality of a new evaluation task to establish its reliability. This paper provides an example of one such assessment by analyzing the task within the TREC 2002 question answering track. The analysis demonstrates that comparative results from the new task are stable, and empirically estimates the size of the difference required between scores to confidently conclude that two runs are different.

引用

下载

页码：260 / 267

页数：8

共 50 条

[31] Belief Measure of Expertise for Experts Detection in Question Answering Communities: case study Stack Overflow
Attiaoui, Dorra
Martin, Arnaud
Ben Yaghlane, Boutheina
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 622 - 631
[32] Evaluating community question-answering websites using interval-valued intuitionistic fuzzy DANP and TODIM methods
Li, Ming
Li, Ying
Peng, Qijin
Wang, Jie
Yu, Chunxia
APPLIED SOFT COMPUTING, 2021, 99
[33] Towards Automated Semantics-Driven Web Service Composition: Case Study on Question Answering Systems
Perevalov, Aleksandr
Both, Andreas
Scherfner, Mike
18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 41 - 48
[34] Evaluation of Railway Track Structural Conditions Using a Fuzzy-logic Method: A Case Study
Rahmani, Asadollah
Seyed-Hosseini, Seyed Mohammad
JORDAN JOURNAL OF CIVIL ENGINEERING, 2021, 15 (01) : 102 - 115
[35] Integrating deep learning for visual question answering in Agricultural Disease Diagnostics: Case Study of Wheat Rust
Akash Nanavaty
Rishikesh Sharma
Bhuman Pandita
Ojasva Goyal
Srinivas Rallapalli
Murari Mandal
Vaibhav Kumar Singh
Pratik Narang
Vinay Chamola
Scientific Reports, 14 (1)
[36] Answering The Question Of Local Biomass Deployment: The Use Of Energy Modelling With Case Study For Non Industrial Customers
Anglani, Norma
Muliere, Giuseppe
PRES'09: 12TH INTERNATIONAL CONFERENCE ON PROCESS INTEGRATION, MODELLING AND OPTIMISATION FOR ENERGY SAVING AND POLLUTION REDUCTION, PTS 1 AND 2, 2009, 18 : 659 - 664
[37] What Do Users Think of Promotional Gamification Schemes A Qualitative Case Study in a Question Answering Website
Hadi Mogavi R.
Zhang Y.
Haq E.-U.
Wu Y.
Hui P.
Ma X.
Proceedings of the ACM on Human-Computer Interaction, 2022, 6
[38] Evaluation of the effect of tamping on the track geometry condition: A case study
Soleimanmeigouni, Iman
Ahmadi, Alireza
Khouy, Iman Arasteh
Letot, Christophe
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART F-JOURNAL OF RAIL AND RAPID TRANSIT, 2018, 232 (02) : 408 - 420
[39] Performance Evaluation of Different Similarity Functions and Classification Methods using Web Based Hindi Language Question Answering System
Devi, Rajni
Dua, Mohit
2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, COMMUNICATION & CONVERGENCE, ICCC 2016, 2016, 92 : 520 - 525
[40] A study on the construction of an automatic legal question answering platform related to power system operation using artificial intelligence
Jung S.-W.
Kim L.-H.
Transactions of the Korean Institute of Electrical Engineers, 2019, 68 (11): : 1450 - 1457

← 1 2 3 4 5 →