On Transforming Relevance Scales

被引:14
|
作者
Han, Lei [1 ]
Roitero, Kevin [2 ]
Maddalena, Eddy [3 ]
Mizzaro, Stefano [2 ]
Demartini, Gianluca [1 ]
机构
[1] Univ Queensland, Brisbane, Qld, Australia
[2] Univ Udine, Udine, Italy
[3] Univ Southampton, Southampton, Hants, England
关键词
Crowdsourcing; IR Evaluation; Assessor Agreement; Relevance Scales;
D O I
10.1145/3357384.3357988
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments have been collected, e.g., to use metrics that assume binary judgments like Mean Average Precision. Such scale transformations are often arbitrary (e.g., 0,1 mapped to 0 and 2,3 mapped to 1) and it is assumed that they have no impact on the results of IR evaluation. Moreover, the use of crowdsourcing to collect relevance judgments has become a standard methodology. When designing the crowdsourcing relevance judgment task, one of the decision to be made is the how granular the relevance scale used to collect judgments should be. Such decision has then repercussions on the metrics used to measure IR system effectiveness. In this paper we look at the effect of scale transformations in a systematic way. We perform extensive experiments to study the transformation of judgments from fine-grained to coarse-grained. We use different relevance judgments expressed on different relevance scales and either expressed by expert annotators or collected by means of crowdsourcing. The objective is to understand the impact of relevance scale transformations on IR evaluation outcomes and to draw conclusions on how to best transform judgments into a different scale, when necessary.
引用
收藏
页码:39 / 48
页数:10
相关论文
共 50 条