Evaluation of system measures for incomplete relevance judgment in IR

被引:0
|
作者
Wu, Shengli [1 ]
McClean, Sally [1 ]
机构
[1] Univ Ulster, Sch Comp & Math, Coleraine BT52 1SA, Londonderry, North Ireland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Incomplete relevance judgment has become a norm for the evaluation of some major information retrieval evaluation events such as TREC, but its effect on some system measures has not been well understood. In this paper, we evaluate four system measures, namely mean average precision, R-precision, normalized average precision over all documents, and normalized discount cumulative gain, under incomplete relevance judgment. Among them, the measure of normalized average precision over all documents is introduced, and both mean average precision and R-precision are generalized for graded relevance judgment. These four measures have a common characteristic: complete relevance judgment is required for the calculation of their accurate values. We empirically investigate these measures through extensive experimentation of TREC data and aim to find the effect of incomplete relevance judgment on them. From these experiments, we conclude that incomplete relevance judgment affects all these four measures' values significantly. When using the pooling method in TREC, the more incomplete the relevance judgment is, the higher the values of all these measures usually become. We also conclude that mean average precision is the most sensitive but least reliable measure, normalized discount cumulative gain and normalized average precision over all documents are the most reliable but least sensitive measures, while R-precision is in the middle.
引用
收藏
页码:245 / 256
页数:12
相关论文
共 50 条
  • [1] On the robustness of relevance measures with incomplete judgments
    Bompada, Tanuja
    Chang, Chi-Chao
    Chen, John
    Kumar, Ravi
    Shenoy, Rajesh
    Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, 2007, : 359 - 366
  • [2] Evaluation of retrieval effectiveness with incomplete relevance data:: Theoretical and experimental comparison of three measures
    Ahlgren, Per
    Gronqvist, Leif
    INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (01) : 212 - 225
  • [3] METHODS OF EVALUATION AND JUDGMENT OF DIAGNOSTIC MEASURES
    RICHTER, K
    TILLIL, H
    MEDIZINISCHE WELT, 1991, 42 (09): : 715 - 719
  • [4] Information retrieval evaluation with partial relevance judgment
    Wu, Shengli
    McClean, Sally
    FLEXIBLE AND EFFICIENT INFORMATION HANDLING, 2006, 4042 : 86 - 93
  • [5] A retrieval evaluation methodology for incomplete relevance assessments
    Baillie, Mark
    Azzopardi, Leif
    Ruthven, Lan
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 271 - +
  • [6] Question answering system for incomplete and noisy data: Methods and measures for its evaluation
    Aunimo, Lili
    Heinonen, Oskari
    Kuuskoski, Reeta
    Makkonen, Juha
    Petit, Renaud
    Virtanen, Otso
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2633 : 193 - 206
  • [7] Question answering system for incomplete and noisy data - Methods and measures for its evaluation
    Aunimo, L
    Heinonen, O
    Kuuskoski, R
    Makkonen, J
    Petit, R
    Virtanen, O
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 193 - 206
  • [8] Are IR Evaluation Measures on an Interval Scale?
    Ferrante, Marco
    Ferro, Nicola
    Pontarollo, Silvia
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 67 - 74
  • [9] Streamlining Evaluation with ir-measures
    MacAvaney, Sean
    Macdonald, Craig
    Ounis, Iadh
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 305 - 310
  • [10] A General Theory of IR Evaluation Measures
    Ferrante, Marco
    Ferro, Nicola
    Pontarollo, Silvia
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (03) : 409 - 422