Testing a tool for the classification of study designs in systematic reviews of interventions and exposures showed moderate reliability and low accuracy

被引:49
|
作者
Hartling, Lisa [1 ,2 ]
Bond, Kenneth [2 ]
Santaguida, P. Lina [3 ]
Viswanathan, Meera [4 ]
Dryden, Donna M. [2 ]
机构
[1] Univ Alberta, Dept Pediat, Alberta Res Ctr Hlth Evidence, Aberhart Ctr, Edmonton, AB T6G 2J3, Canada
[2] Univ Alberta, Univ Alberta Evidence Based Practice Ctr, Edmonton, AB T6G 2J3, Canada
[3] McMaster Univ, Dept Clin Epidemiol & Biostat, McMaster Univ Evidence Based Practice Ctr, Hamilton, ON, Canada
[4] RTI Int, Div Hlth Serv & Social Policy Res, Res Triangle Pk, NC USA
基金
美国医疗保健研究与质量局;
关键词
Research design; Classification; Reliability; Validity; Systematic review; Observational studies; Nonrandomized studies; SAMPLE-SIZE; CHALLENGES;
D O I
10.1016/j.jclinepi.2011.01.010
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: To develop and test a study design classification tool. Study Design: We contacted relevant.organizations and individuals to identify tools used to classify study designs and ranked these using predefined criteria. The highest ranked tool was a design algorithm developed, but no longer advocated, by the Cochrane Non-Randomized Studies Methods Group; this was modified to include additional study designs and decision points. We developed a reference classification for 30 studies; 6 testers applied the tool to these studies. Interrater reliability (Fleiss' kappa) and accuracy against the reference classification were assessed. The tool was further revised and retested. Results: Initial reliability was fair among the testers (kappa = 0.26) and the reference standard raters kappa = 0.33). Testing after revisions showed improved reliability (kappa = 0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study design was experimental (5 of 15 studies), and whether there was a comparison of any kind (4 of 15 studies). Agreement was higher among testers who had completed graduate level training versus those who had not. Conclusion: The moderate reliability and low accuracy may be because of lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in tester characteristics. The results may not be generalizable to all published studies, as the test studies were selected because they had posed challenges for previous reviewers with respect to their design classification. Application of such a tool should be accompanied by training, pilot testing, and context-specific decision rules. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:861 / 871
页数:11
相关论文
共 10 条
  • [1] A newly developed tool for classifying study designs in systematic reviews of interventions and exposures showed substantial reliability and validity
    Seo, Hyun-Ju
    Kim, Soo Young
    Lee, Yoon Jae
    Jang, Bo-Hyoung
    Park, Ji-Eun
    Sheen, Seung-Soo
    Hahn, Seo Kyung
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2016, 70 : 200 - 205
  • [2] Study Designs and Systematic Reviews of Interventions: Building Evidence Across Study Designs
    Sargeant, J. M.
    Kelton, D. F.
    O'Connor, A. M.
    [J]. ZOONOSES AND PUBLIC HEALTH, 2014, 61 : 10 - 17
  • [3] The risk of bias in systematic reviews tool showed fair reliability and good construct validity
    Buehn, Stefanie
    Mathes, Tim
    Prengel, Peggy
    Wegewitz, Uta
    Ostermann, Thomas
    Robens, Sibylle
    Pieper, Dawid
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2017, 91 : 121 - 128
  • [4] An algorithm for the classification of study designs to assess diagnostic, prognostic and predictive test accuracy in systematic reviews
    Mathes, Tim
    Pieper, Dawid
    [J]. SYSTEMATIC REVIEWS, 2019, 8 (01)
  • [5] An algorithm for the classification of study designs to assess diagnostic, prognostic and predictive test accuracy in systematic reviews
    Tim Mathes
    Dawid Pieper
    [J]. Systematic Reviews, 8
  • [6] Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity
    Kim, Soo Young
    Park, Ji Eun
    Lee, Yoon Jae
    Seo, Hyun-Ju
    Sheen, Seung-Soo
    Hahn, Seokyung
    Jang, Bo-Hyoung
    Son, Hee-Jung
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) : 408 - 414
  • [7] Development of an algorithm to provide awareness in choosing study designs for inclusion in systematic reviews of healthcare interventions: a method study
    Peinemann, Frank
    Kleijnen, Jos
    [J]. BMJ OPEN, 2015, 5 (08):
  • [8] Testing the Risk of Bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs
    Hartling, Lisa
    Hamm, Michele P.
    Milne, Andrea
    Vandermeer, Ben
    Santaguida, P. Lina
    Ansari, Mohammed
    Tsertsvadze, Alexander
    Hempel, Susanne
    Shekelle, Paul
    Dryden, Donna M.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (09) : 973 - 981
  • [9] From the Trenches: A Cross-Sectional Study Applying the GRADE Tool in Systematic Reviews of Healthcare Interventions
    Hartling, Lisa
    Fernandes, Ricardo M.
    Seida, Jennifer
    Vandermeer, Ben
    Dryden, Donna M.
    [J]. PLOS ONE, 2012, 7 (04):
  • [10] Quality appraisal in systematic reviews of public health interventions: an empirical study on the impact of choice of tool on meta-analysis
    Voss, Peer H.
    Rehfuess, Eva A.
    [J]. JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2013, 67 (01) : 98 - 104