Improving Consensus Scoring of Crowdsourced Data Using the Rasch Model: Development and Refinement of a Diagnostic Instrument

被引:10
|
作者
Brady, Christopher John [1 ]
Mudie, Lucy Iluka [1 ]
Wang, Xueyang [1 ]
Guallar, Eliseo [2 ]
Friedman, David Steven [1 ,2 ]
机构
[1] Johns Hopkins Univ, Sch Med, Wilmer Eye Inst, Dana Ctr Prevent Ophthalmol, 600 N Wolfe St, Baltimore, MD 21205 USA
[2] Johns Hopkins Univ, Dept Epidemiol, Bloomberg Sch Publ Hlth, Baltimore, MD USA
基金
美国国家卫生研究院;
关键词
crowdsourcing; diabetic retinopathy; Rasch analysis; Amazon Mechanical Turk; DIABETIC-RETINOPATHY; TELEMEDICINE; RISK; MELLITUS; IMAGES;
D O I
10.2196/jmir.7984
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Diabetic retinopathy (DR) is a leading cause of vision loss in working age individuals worldwide. While screening is effective and cost effective, it remains underutilized, and novel methods are needed to increase detection of DR. This clinical validation study compared diagnostic gradings of retinal fundus photographs provided by volunteers on the Amazon Mechanical Turk (AMT) crowdsourcing marketplace with expert-provided gold-standard grading and explored whether determination of the consensus of crowdsourced classifications could be improved beyond a simple majority vote (MV) using regression methods. Objective: The aim of our study was to determine whether regression methods could be used to improve the consensus grading of data collected by crowdsourcing. Methods: A total of 1200 retinal images of individuals with diabetes mellitus from the Messidor public dataset were posted to AMT. Eligible crowdsourcing workers had at least 500 previously approved tasks with an approval rating of 99% across their prior submitted work. A total of 10 workers were recruited to classify each image as normal or abnormal. If half or more workers judged the image to be abnormal, the MV consensus grade was recorded as abnormal. Rasch analysis was then used to calculate worker ability scores in a random 50% training set, which were then used as weights in a regression model in the remaining 50% test set to determine if a more accurate consensus could be devised. Outcomes of interest were the percent correctly classified images, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) for the consensus grade as compared with the expert grading provided with the dataset. Results: Using MV grading, the consensus was correct in 75.5% of images (906/1200), with 75.5% sensitivity, 75.5% specificity, and an AUROC of 0.75 (95% CI 0.73-0.78). A logistic regression model using Rasch-weighted individual scores generated an AUROC of 0.91 (95% CI 0.88-0.93) compared with 0.89 (95% CI 0.86-92) for a model using unweighted scores (chi-square P value<.001). Setting a diagnostic cut-point to optimize sensitivity at 90%, 77.5% (465/600) were graded correctly, with 90.3% sensitivity, 68.5% specificity, and an AUROC of 0.79 (95% CI 0.76-0.83). Conclusions: Crowdsourced interpretations of retinal images provide rapid and accurate results as compared with a gold-standard grading. Creating a logistic regression model using Rasch analysis to weight crowdsourced classifications by worker ability improves accuracy of aggregated grades as compared with simple majority vote.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Rasch modelling improves consensus scoring of crowdsourced data
    Brady, Christopher J.
    Mudie, Lucy
    Friedman, David S.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2017, 58 (08)
  • [2] Using Rasch measurement for instrument rating scale refinement
    Peeters, Michael J.
    Augustine, Jill M.
    CURRENTS IN PHARMACY TEACHING AND LEARNING, 2023, 15 (01) : 110 - 118
  • [3] Development of a Cell Transmission Model Using Crowdsourced Data for Expressways
    Wijepala, W. M. R., V
    de Silva, G. Ld, I
    MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 468 - 473
  • [4] Enhanced Interpretation of Instrument Scales Using the Rasch Model
    John R. Thompson
    Joseph C. Cappelleri
    Christine Getter
    Andreas Pleil
    Martin Reichel
    Sebastian Wolf
    Drug information journal : DIJ / Drug Information Association, 2007, 41 : 541 - 550
  • [5] Environmental health knowledge of healthcare professionals: Instrument development and validation using the Rasch model
    Vrotsou, Kalliopi
    Subiza-Perez, Mikel
    Lertxundi, Aitana
    Vergara, Itziar
    Marti-Carrera, Itxaso
    de Retana, Lourdes Ochoa
    Duo, Irene
    Ibarluzea, Jesus
    ENVIRONMENTAL RESEARCH, 2023, 235
  • [6] Enhanced interpretation of instrument scales using the Rasch model
    Thompson, John R.
    Cappelleri, Joseph C.
    Getter, Christine
    Pleil, Andreas
    Reichel, Martin
    Wolf, Sebastian
    DRUG INFORMATION JOURNAL, 2007, 41 (04): : 541 - 550
  • [7] Development and Validation of Scientific Inquiry Literacy Instrument (SILI) Using Rasch Measurement Model
    Darman, Dina Rahmi
    Suhandi, Andi
    Kaniawati, Ida
    Samsudin, Achmad
    Wibowo, Firmanul Catur
    EDUCATION SCIENCES, 2024, 14 (03):
  • [8] Development Of A Patient-AT Trust Instrument Using Rasch Modeling
    David, Shannon L.
    Louk, Jamie
    Kang, Minsoo
    Ragan, Brian G.
    MEDICINE AND SCIENCE IN SPORTS AND EXERCISE, 2013, 45 (05): : 347 - 347
  • [9] Calibration of the barrier instrument using the Rasch Rating Scaling model
    Zhu, WM
    RESEARCH QUARTERLY FOR EXERCISE AND SPORT, 2001, 72 (01) : A98 - A98
  • [10] Using Rasch analysis for scale development and refinement in tourism: Theory and illustration
    Hergesell, Anja
    JOURNAL OF BUSINESS RESEARCH, 2022, 142 : 551 - 561