The Effect of Assessor Errors on IR System Evaluation

被引:0
|
作者
Carterette, Ben [1 ]
Soboroff, Ian [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
关键词
assessor error; retrieval test collections;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the topics, this is a reasonable approach. However, as topic set sizes grow, and researchers look to crowdsourcing and Amazon's Mechanical Turk to collect relevance judgments, we are faced with issues of quality control. This paper examines the robustness of the TREC Million Query track methods when some assessors make significant and systematic errors. We find that while averages are robust, assessor errors can have a large effect on system rankings.
引用
收藏
页码:539 / 546
页数:8
相关论文
共 50 条
  • [1] The effect of inter-assessor disagreement on IR system evaluation: A case study with lancers and students
    Sakai, Tetsuya
    CEUR Workshop Proceedings, 2017, 2008 : 31 - 38
  • [2] Considering Assessor Agreement in IR Evaluation
    Maddalena, Eddy
    Roitero, Kevin
    Demartini, Gianluca
    Mizzaro, Stefano
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 75 - 82
  • [3] THE ASSESSOR PRE-TEST MARKET EVALUATION SYSTEM
    URBAN, GL
    KATZ, GM
    HATCH, TE
    SILK, AJ
    INTERFACES, 1983, 13 (06) : 38 - 59
  • [4] Evaluation on effect of errors on haptic information in wireless communication system
    Marugame, T
    Kamikura, A
    Ohnishi, M
    Nakagawa, M
    IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 2950 - 2955
  • [5] ASSESSOR PRE-TEST MARKET EVALUATION SYSTEM.
    Urban, Glenn L.
    Katz, Gerald M.
    Hatch, Thomas E.
    Silk, Alvin J.
    1600, (13):
  • [6] On Topic Difficulty in IR Evaluation: The Effect of Systems, Corpora, and System Components
    Zampieri, Fabio
    Roitero, Kevin
    Culpepper, J. Shane
    Kurland, Oren
    Mizzaro, Stefano
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 909 - 912
  • [7] Compensating Position Measurement Errors for the IR Static Triangulation System
    Ciezkowski, Maciej
    Wolniakowski, Adam
    ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2018, 2019, 67 : 660 - 668
  • [8] Evaluation of assessor performance in sensory analysis
    Piggott, JR
    Hunter, EA
    ITALIAN JOURNAL OF FOOD SCIENCE, 1999, 11 (04) : 289 - 303
  • [9] User Variability and IR System Evaluation
    Bailey, Peter
    Moffat, Alistair
    Scholer, Falk
    Thomas, Paul
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 625 - 634
  • [10] The instrumentation for the evaluation: The teacher assessor toolkit
    Charles, Patrick
    REVUE DES SCIENCES DE L EDUCATION, 2016, 42 (01): : 194 - 195