Statistical description of interrater variability in ordinal ratings

被引：69

作者：

Nelson, JC ^{[1
]}

Pepe, MS ^{[1
]}

机构：

[1] Univ Washington, Dept Biostat, Seattle, WA 98195 USA

来源：

STATISTICAL METHODS IN MEDICAL RESEARCH | 2000年 / 9卷 / 05期

关键词：

D O I：

10.1191/096228000701555262

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Ordinal categorical assessments are common in medical practice and in research. Variability in such measurements amongst raters making the assessments can be problematic. In this paper we consider how such variability can be described statistically. We review three current approaches, including kappa-type statistics, loglinear models for agreement, and latent class agreement models, and discuss their limitations. We present a new graphical approach to describing interrater variability that involves a simple frequency distribution display of the category probabilities. The method enables description of interrater variability when raters are a random sample from some population as opposed to the traditional setting in which only a few selected raters provide assessments. Advantages of this approach relative to current approaches include the following: (1) it provides a simple visual summary of the rating data, (2) description is closely linked to familiar methods for describing variability in continuous measurements, (3) interpretation is straightforward, and (4) a large sample of raters can be accommodated with ease. We illustrate the method on simulated ordinal data representing radiologists' ratings of mammography images and on rating data from a national image reading study of mammography screening.

引用

页码：475 / 496

页数：22

共 50 条

[1] Are ordinal rating scales better than percent ratings? a statistical and “psychological” view
K. Hartung
H.-P. Piepho
Euphytica, 2007, 155 : 15 - 26
[2] Are ordinal rating scales better than percent ratings? a statistical and "psychological" view
Hartung, K.
Piepho, H. -P.
EUPHYTICA, 2007, 155 (1-2) : 15 - 26
[3] INTERRATER ABSOLUTE AGREEMENT FOR ORDINAL RATING SCALES
Bove, Giuseppe
Marella, Daniela
JOURNAL OF EDUCATIONAL CULTURAL AND PSYCHOLOGICAL STUDIES, 2021, (23): : 239 - 248
[4] THE PREDICTABILITY OF RATINGS AS A FUNCTION OF INTERRATER AGREEMENT
BUCKNER, DN
JOURNAL OF APPLIED PSYCHOLOGY, 1959, 43 (01) : 60 - 64
[5] Measuring interrater agreement for ratings of a single target
Lindell, MK
Brandt, CJ
APPLIED PSYCHOLOGICAL MEASUREMENT, 1997, 21 (03) : 271 - 278
[6] Evaluating the interrater agreement of process capability ratings
Fusaro, P
ElEmam, K
Smith, B
FOURTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM, PROCEEDINGS, 1997, : 2 - 11
[7] EFFECTS OF RACE ON INTERRATER RELIABILITY OF PEER RATINGS
JORDAN, JL
PSYCHOLOGICAL REPORTS, 1989, 64 (03) : 1221 - 1222
[8] INTERRATER RELIABILITY OF RATINGS OF DELUSIONS AND BIZARRE DELUSIONS
MOJTABAI, R
NICHOLSON, RA
AMERICAN JOURNAL OF PSYCHIATRY, 1995, 152 (12): : 1804 - 1806
[9] THE EFFECT OF SIMILARITY ON PERFORMANCE RATINGS AND INTERRATER AGREEMENT
ZALESNY, MD
KIRSCH, MP
HUMAN RELATIONS, 1989, 42 (01) : 81 - 96
[10] A measure of interrater absolute agreement for ordinal categorical data
Bove, Giuseppe
Conti, Pier Luigi
Marella, Daniela
STATISTICAL METHODS AND APPLICATIONS, 2021, 30 (03): : 927 - 945

← 1 2 3 4 5 →