Evaluation of system measures for incomplete relevance judgment in IR

被引：0

作者：

Wu, Shengli ^{[1
]}

McClean, Sally ^{[1
]}

机构：

[1] Univ Ulster, Sch Comp & Math, Coleraine BT52 1SA, Londonderry, North Ireland

来源：

FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS | 2006年 / 4027卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Incomplete relevance judgment has become a norm for the evaluation of some major information retrieval evaluation events such as TREC, but its effect on some system measures has not been well understood. In this paper, we evaluate four system measures, namely mean average precision, R-precision, normalized average precision over all documents, and normalized discount cumulative gain, under incomplete relevance judgment. Among them, the measure of normalized average precision over all documents is introduced, and both mean average precision and R-precision are generalized for graded relevance judgment. These four measures have a common characteristic: complete relevance judgment is required for the calculation of their accurate values. We empirically investigate these measures through extensive experimentation of TREC data and aim to find the effect of incomplete relevance judgment on them. From these experiments, we conclude that incomplete relevance judgment affects all these four measures' values significantly. When using the pooling method in TREC, the more incomplete the relevance judgment is, the higher the values of all these measures usually become. We also conclude that mean average precision is the most sensitive but least reliable measure, normalized discount cumulative gain and normalized average precision over all documents are the most reliable but least sensitive measures, while R-precision is in the middle.

引用

页码：245 / 256

页数：12

共 50 条

[21] Relevance measures for the creation of groups in an annotation system
Avola, Danilo
Bottoni, Paolo
Hawash, Amjad
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2014, 25 (06): : 695 - 702
[22] Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements
Magdy, Walid
Jones, Gareth J. F.
MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2010, 6360 : 82 - 93
[23] RELEVANCE AND CATEGORY SCALES OF JUDGMENT
DAVIDON, RS
BRITISH JOURNAL OF PSYCHOLOGY, 1962, 53 (NOV) : 373 - &
[24] Order effect in relevance judgment
Xu, Yunjie
Wang, Dong
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (08): : 1264 - 1275
[25] Culture and judgment of causal relevance
Choi, I
Dalal, R
Kim-Prieto, C
Park, H
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2003, 84 (01) : 46 - 59
[26] (Ir)rationality of Moral Judgment
Regenwetter, Michel
Currie, Brittney
Huang, Yu
Smeulders, Bart
Carlson, Anna K.
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2025,
[27] Moral Judgment System Using Evaluation Expressions
Yamamoto, Masahiro
Hagiwara, Masafumi
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 1040 - 1047
[28] A penalty system to enforce policy measures under incomplete information
Kritikos, AS
INTERNATIONAL REVIEW OF LAW AND ECONOMICS, 2004, 24 (03) : 385 - 403
[29] INVESTIGATING INCOMPLETE FUSION IN 12C+193Ir SYSTEM
Amanjot
Kaushik M.
Raizada P.
Kumar S.
Sharma M.K.
Kumar R.
Singh P.P.
Acta Physica Polonica B, Proceedings Supplement, 2024, 17 (03)
[30] Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales
Ferrante, Marco
Ferro, Nicola
Fuhr, Norbert
IEEE ACCESS, 2021, 9 : 136182 - 136216

← 1 2 3 4 5 →