The Effect of Assessor Errors on IR System Evaluation

被引:0
|
作者
Carterette, Ben [1 ]
Soboroff, Ian [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
关键词
assessor error; retrieval test collections;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent efforts in test collection building have focused on scaling back the number of necessary relevance judgments and then scaling up the number of search topics. Since the largest source of variation in a Cranfield-style experiment comes from the topics, this is a reasonable approach. However, as topic set sizes grow, and researchers look to crowdsourcing and Amazon's Mechanical Turk to collect relevance judgments, we are faced with issues of quality control. This paper examines the robustness of the TREC Million Query track methods when some assessors make significant and systematic errors. We find that while averages are robust, assessor errors can have a large effect on system rankings.
引用
收藏
页码:539 / 546
页数:8
相关论文
共 50 条
  • [21] The effect of pooling and evaluation depth on IR metrics
    Lu, Xiaolu
    Moffat, Alistair
    Culpepper, J. Shane
    INFORMATION RETRIEVAL JOURNAL, 2016, 19 (04): : 416 - 445
  • [22] COULD THE PRESENT TOWNSHIP ASSESSOR SYSTEM BE IMPROVED?
    Ives, Henry A. S.
    BULLETIN OF THE NATIONAL TAX ASSOCIATION, 1918, 3 (09): : 219 - 227
  • [23] Legal System and Legal Translation: Juror or Assessor?
    Wang, Lu
    2019 4TH INTERNATIONAL CONFERENCE ON EDUCATION SCIENCE AND DEVELOPMENT (ICESD 2019), 2019,
  • [24] MEASURING SYSTEM ELIMINATES EFFECT OF POSITIONING ERRORS
    STEFANID.EJ
    DESIGN NEWS, 1971, 26 (09) : 49 - &
  • [25] ARTIFACTS AND ERRORS IN THE ELECTRONYSTAGMOGRAPHIC (ENG) EVALUATION OF THE VESTIBULAR SYSTEM
    KILENY, P
    KEMINK, JL
    EAR AND HEARING, 1986, 7 (03): : 151 - 156
  • [26] DETERMINATION OF MOLAR IR ABSORPTIVITIES AND THEIR ERRORS
    STAAT, H
    KORTE, EH
    JOURNAL OF MOLECULAR STRUCTURE, 1984, 114 (MAR) : 297 - 300
  • [27] Evaluation of system measures for incomplete relevance judgment in IR
    Wu, Shengli
    McClean, Sally
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 245 - 256
  • [28] User Intent and Assessor Disagreement in Web Search Evaluation
    Kazai, Gabriella
    Yilmaz, Emine
    Craswell, Nick
    Tahaghoghi, S. M. M.
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 699 - 708
  • [29] INFLUENCES ON INFERENCES - EFFECT OF ERRORS IN DATA ON STATISTICAL EVALUATION
    LEVITT, SH
    AEPPLI, DM
    POTISH, RA
    LEE, CK
    NIERENGARTEN, ME
    CANCER, 1993, 72 (07) : 2075 - 2082
  • [30] Evaluation of a correction for photometric errors in FT-IR spectrometry introduced by a nonlinear detector response
    Richardson, RL
    Yang, HS
    Griffiths, PR
    APPLIED SPECTROSCOPY, 1998, 52 (04) : 565 - 571