Evaluation of system measures for incomplete relevance judgment in IR

被引:0
|
作者
Wu, Shengli [1 ]
McClean, Sally [1 ]
机构
[1] Univ Ulster, Sch Comp & Math, Coleraine BT52 1SA, Londonderry, North Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Incomplete relevance judgment has become a norm for the evaluation of some major information retrieval evaluation events such as TREC, but its effect on some system measures has not been well understood. In this paper, we evaluate four system measures, namely mean average precision, R-precision, normalized average precision over all documents, and normalized discount cumulative gain, under incomplete relevance judgment. Among them, the measure of normalized average precision over all documents is introduced, and both mean average precision and R-precision are generalized for graded relevance judgment. These four measures have a common characteristic: complete relevance judgment is required for the calculation of their accurate values. We empirically investigate these measures through extensive experimentation of TREC data and aim to find the effect of incomplete relevance judgment on them. From these experiments, we conclude that incomplete relevance judgment affects all these four measures' values significantly. When using the pooling method in TREC, the more incomplete the relevance judgment is, the higher the values of all these measures usually become. We also conclude that mean average precision is the most sensitive but least reliable measure, normalized discount cumulative gain and normalized average precision over all documents are the most reliable but least sensitive measures, while R-precision is in the middle.
引用
收藏
页码:245 / 256
页数:12
相关论文
共 50 条
  • [21] Relevance measures for the creation of groups in an annotation system
    Avola, Danilo
    Bottoni, Paolo
    Hawash, Amjad
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2014, 25 (06): : 695 - 702
  • [22] Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements
    Magdy, Walid
    Jones, Gareth J. F.
    MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2010, 6360 : 82 - 93
  • [23] RELEVANCE AND CATEGORY SCALES OF JUDGMENT
    DAVIDON, RS
    BRITISH JOURNAL OF PSYCHOLOGY, 1962, 53 (NOV) : 373 - &
  • [24] Order effect in relevance judgment
    Xu, Yunjie
    Wang, Dong
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (08): : 1264 - 1275
  • [25] Culture and judgment of causal relevance
    Choi, I
    Dalal, R
    Kim-Prieto, C
    Park, H
    JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2003, 84 (01) : 46 - 59
  • [26] (Ir)rationality of Moral Judgment
    Regenwetter, Michel
    Currie, Brittney
    Huang, Yu
    Smeulders, Bart
    Carlson, Anna K.
    PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2025,
  • [27] Moral Judgment System Using Evaluation Expressions
    Yamamoto, Masahiro
    Hagiwara, Masafumi
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 1040 - 1047
  • [28] A penalty system to enforce policy measures under incomplete information
    Kritikos, AS
    INTERNATIONAL REVIEW OF LAW AND ECONOMICS, 2004, 24 (03) : 385 - 403
  • [29] INVESTIGATING INCOMPLETE FUSION IN 12C+193Ir SYSTEM
    Amanjot
    Kaushik M.
    Raizada P.
    Kumar S.
    Sharma M.K.
    Kumar R.
    Singh P.P.
    Acta Physica Polonica B, Proceedings Supplement, 2024, 17 (03)
  • [30] Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales
    Ferrante, Marco
    Ferro, Nicola
    Fuhr, Norbert
    IEEE ACCESS, 2021, 9 : 136182 - 136216