Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy

被引：49

作者：

Hartling, Lisa ^{[1
,2
]}

Bond, Kenneth ^{[2
]}

Santaguida, P. Lina ^{[3
]}

Viswanathan, Meera ^{[4
]}

Dryden, Donna M. ^{[2
]}

机构：

[1] Univ Alberta, Dept Pediat, Alberta Res Ctr Hlth Evidence, Aberhart Ctr, Edmonton, AB T6G 2J3, Canada

[2] Univ Alberta, Univ Alberta Evidence Based Practice Ctr, Edmonton, AB T6G 2J3, Canada

[3] McMaster Univ, Dept Clin Epidemiol & Biostat, McMaster Univ Evidence Based Practice Ctr, Hamilton, ON, Canada

[4] RTI Int, Div Hlth Serv & Social Policy Res, Res Triangle Pk, NC USA

来源：

JOURNAL OF CLINICAL EPIDEMIOLOGY | 2011年 / 64卷 / 08期

基金：

美国医疗保健研究与质量局;

关键词：

Research design; Classification; Reliability; Validity; Systematic review; Observational studies; Nonrandomized studies; SAMPLE-SIZE; CHALLENGES;

D O I：

10.1016/j.jclinepi.2011.01.010

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Objectives: To develop and test a study design classification tool. Study Design: We contacted relevant.organizations and individuals to identify tools used to classify study designs and ranked these using predefined criteria. The highest ranked tool was a design algorithm developed, but no longer advocated, by the Cochrane Non-Randomized Studies Methods Group; this was modified to include additional study designs and decision points. We developed a reference classification for 30 studies; 6 testers applied the tool to these studies. Interrater reliability (Fleiss' kappa) and accuracy against the reference classification were assessed. The tool was further revised and retested. Results: Initial reliability was fair among the testers (kappa = 0.26) and the reference standard raters kappa = 0.33). Testing after revisions showed improved reliability (kappa = 0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study design was experimental (5 of 15 studies), and whether there was a comparison of any kind (4 of 15 studies). Agreement was higher among testers who had completed graduate level training versus those who had not. Conclusion: The moderate reliability and low accuracy may be because of lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in tester characteristics. The results may not be generalizable to all published studies, as the test studies were selected because they had posed challenges for previous reviewers with respect to their design classification. Application of such a tool should be accompanied by training, pilot testing, and context-specific decision rules. (C) 2011 Elsevier Inc. All rights reserved.

引用

页码：861 / 871

页数：11

共 10 条

[1] A newly developed tool for classifying study designs in systematic reviews of interventions and exposures showed substantial reliability and validity
Seo, Hyun-Ju
Kim, Soo Young
Lee, Yoon Jae
Jang, Bo-Hyoung
Park, Ji-Eun
Sheen, Seung-Soo
Hahn, Seo Kyung
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2016, 70 : 200 - 205
[2] Study Designs and Systematic Reviews of Interventions: Building Evidence Across Study Designs
Sargeant, J. M.
Kelton, D. F.
O'Connor, A. M.
[J]. ZOONOSES AND PUBLIC HEALTH, 2014, 61 : 10 - 17
[3] The risk of bias in systematic reviews tool showed fair reliability and good construct validity
Buehn, Stefanie
Mathes, Tim
Prengel, Peggy
Wegewitz, Uta
Ostermann, Thomas
Robens, Sibylle
Pieper, Dawid
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2017, 91 : 121 - 128
[4] An algorithm for the classification of study designs to assess diagnostic, prognostic and predictive test accuracy in systematic reviews
Mathes, Tim
Pieper, Dawid
[J]. SYSTEMATIC REVIEWS, 2019, 8 (01)
[5] An algorithm for the classification of study designs to assess diagnostic, prognostic and predictive test accuracy in systematic reviews
Tim Mathes
Dawid Pieper
[J]. Systematic Reviews, 8
[6] Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity
Kim, Soo Young
Park, Ji Eun
Lee, Yoon Jae
Seo, Hyun-Ju
Sheen, Seung-Soo
Hahn, Seokyung
Jang, Bo-Hyoung
Son, Hee-Jung
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) : 408 - 414
[7] Development of an algorithm to provide awareness in choosing study designs for inclusion in systematic reviews of healthcare interventions: a method study
Peinemann, Frank
Kleijnen, Jos
[J]. BMJ OPEN, 2015, 5 (08):
[8] Testing the Risk of Bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs
Hartling, Lisa
Hamm, Michele P.
Milne, Andrea
Vandermeer, Ben
Santaguida, P. Lina
Ansari, Mohammed
Tsertsvadze, Alexander
Hempel, Susanne
Shekelle, Paul
Dryden, Donna M.
[J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (09) : 973 - 981
[9] From the Trenches: A Cross-Sectional Study Applying the GRADE Tool in Systematic Reviews of Healthcare Interventions
Hartling, Lisa
Fernandes, Ricardo M.
Seida, Jennifer
Vandermeer, Ben
Dryden, Donna M.
[J]. PLOS ONE, 2012, 7 (04):
[10] Quality appraisal in systematic reviews of public health interventions: an empirical study on the impact of choice of tool on meta-analysis
Voss, Peer H.
Rehfuess, Eva A.
[J]. JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2013, 67 (01) : 98 - 104

← 1 →