The Effect of Assessor Errors on IR System Evaluation

被引：0

作者：

Carterette, Ben ^{[1
]}

Soboroff, Ian ^{[1
]}

机构：

[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA

来源：

SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL | 2010年

关键词：

assessor error; retrieval test collections;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the topics, this is a reasonable approach. However, as topic set sizes grow, and researchers look to crowdsourcing and Amazon's Mechanical Turk to collect relevance judgments, we are faced with issues of quality control. This paper examines the robustness of the TREC Million Query track methods when some assessors make significant and systematic errors. We find that while averages are robust, assessor errors can have a large effect on system rankings.

引用

页码：539 / 546

页数：8

共 50 条

[1] The effect of inter-assessor disagreement on IR system evaluation: A case study with lancers and students
Sakai, Tetsuya
CEUR Workshop Proceedings, 2017, 2008 : 31 - 38
[2] Considering Assessor Agreement in IR Evaluation
Maddalena, Eddy
Roitero, Kevin
Demartini, Gianluca
Mizzaro, Stefano
ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 75 - 82
[3] THE ASSESSOR PRE-TEST MARKET EVALUATION SYSTEM
URBAN, GL
KATZ, GM
HATCH, TE
SILK, AJ
INTERFACES, 1983, 13 (06) : 38 - 59
[4] Evaluation on effect of errors on haptic information in wireless communication system
Marugame, T
Kamikura, A
Ohnishi, M
Nakagawa, M
IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 2950 - 2955
[5] ASSESSOR PRE-TEST MARKET EVALUATION SYSTEM.
Urban, Glenn L.
Katz, Gerald M.
Hatch, Thomas E.
Silk, Alvin J.
1600, (13):
[6] On Topic Difficulty in IR Evaluation: The Effect of Systems, Corpora, and System Components
Zampieri, Fabio
Roitero, Kevin
Culpepper, J. Shane
Kurland, Oren
Mizzaro, Stefano
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 909 - 912
[7] Compensating Position Measurement Errors for the IR Static Triangulation System
Ciezkowski, Maciej
Wolniakowski, Adam
ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2018, 2019, 67 : 660 - 668
[8] Evaluation of assessor performance in sensory analysis
Piggott, JR
Hunter, EA
ITALIAN JOURNAL OF FOOD SCIENCE, 1999, 11 (04) : 289 - 303
[9] User Variability and IR System Evaluation
Bailey, Peter
Moffat, Alistair
Scholer, Falk
Thomas, Paul
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 625 - 634
[10] The instrumentation for the evaluation: The teacher assessor toolkit
Charles, Patrick
REVUE DES SCIENCES DE L EDUCATION, 2016, 42 (01): : 194 - 195

← 1 2 3 4 5 →