Exploring the Stability of Differential Item Functioning Across Administrations and Critical Values Using the Rasch Separate Calibration t-test Method

被引：6

作者：

Peabody, Michael R. ^{[1
]}

Wind, Stefanie A. ^{[2
]}

机构：

[1] Amer Board Family Med, Psychometr Dept, Lexington, KY 40511 USA

[2] Univ Alabama, Educ Res Dept, Tuscaloosa, AL USA

来源：

MEASUREMENT-INTERDISCIPLINARY RESEARCH AND PERSPECTIVES | 2019年 / 17卷 / 02期

关键词：

Differential item functioning; Rasch model; certification; multiple administrations; MANTEL-HAENSZEL; LOGISTIC-REGRESSION; RESPONSE THEORY; SIZE; BIAS;

D O I：

10.1080/15366367.2018.1533782

中图分类号：

C [社会科学总论];

学科分类号：

03 ; 0303 ;

摘要：

Differential Item Functioning (DIF) detection procedures provide validity evidence for proposed interpretations of test scores that can help researchers and practitioners ensure that test scores are free from potential bias, and that individual items do not create an advantage for any subgroup of examinees over another. In this study, we use the Rasch separate calibration t-test method to examine the effects of different levels of contrast at varying levels of statistical significance on the flagging of items across multiple examination administrations. We assert that if DIF is a stable trait of an item, it should be sample-independent and detected each time it is administered. We examine the consistency of different alpha levels and critical values in identifying DIF for the same items across multiple administrations. Our results suggest that, using our most lenient criteria, approximately 40% of items on any exam administration may be flagged for DIF, but this drops to 20% when considering items across two administrations, and 12% across three administrations. Testing organizations can use the methods illustrated here to set their own thresholds for DIF, which may be useful if organizations need to estimate the time and cost associated with having independent reviewers examine items.

引用

页码：78 / 92

页数：15

共 2 条

[1] Iterative Linking With the Differential Functioning of Items and Tests (DFIT) Method: Comparison of Testwide and Item Parameter Replication (IPR) Critical Values
Seybert, Jacob
Stark, Stephen
[J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 2012, 36 (06) : 494 - 515
[2] Comparison of quality of life in pre-emptive and dialyzed patients on waiting list for kidney transplantation. Exploring differential item functioning using Rasch Measurement Theory Models.
Enjalbert, Line
Hardouin, Jean-Benoit
Giral, Magali
Meurette, Aurelie
Sebille, Veronique
[J]. QUALITY OF LIFE RESEARCH, 2017, 26 (01) : 128 - 129

← 1 →