Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets

被引:20
|
作者
Marston, Louise [1 ]
Peacock, Janet L. [6 ]
Yu, Keming [3 ]
Brocklehurst, Peter [7 ]
Calvert, Sandra A. [4 ]
Greenough, Anne [5 ]
Marlow, Neil [2 ]
机构
[1] Brunel Univ, Dept Primary Care & Populat Hlth, Uxbridge UB8 3PH, Middx, England
[2] Brunel Univ, Inst Womens Hlth, UCL, Uxbridge UB8 3PH, Middx, England
[3] Brunel Univ, Sch Informat Syst Comp & Math, Uxbridge UB8 3PH, Middx, England
[4] Univ London, Dept Child Hlth, London WC1E 7HU, England
[5] Kings Coll London, Div Asthma Allergy & Lung Biol, Sch Med, London WC2R 2LS, England
[6] Univ Southampton, Dept Publ Hlth Sci & Med Stat, Southampton, Hants, England
[7] Univ Oxford, Natl Perinatal Epidemiol Unit, Oxford, England
关键词
multiple births; statistical methodology; multilevel model; generalised estimating equations; multiple linear regression; cluster; LONGITUDINAL DATA-ANALYSIS; RANDOMIZED-TRIALS; REGRESSION-MODELS; BINARY DATA; QUADRATURE; EXAMPLE; TWIN;
D O I
10.1111/j.1365-3016.2009.01046.x
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
引用
收藏
页码:380 / 392
页数:13
相关论文
共 50 条
  • [41] Deep Learning for Emotion Recognition on Small Datasets Using Transfer Learning
    Hong-Wei Ng
    Viet Dung Nguyen
    Vonikakis, Vassilios
    Winkler, Stefan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 443 - 449
  • [42] Wastewater Quality Screening Using Affinity Propagation Clustering and Entropic Methods for Small Saturated Nonlinear Orthogonal Datasets
    Besseris, George
    WATER, 2022, 14 (08)
  • [43] GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies
    Lin, Lin
    Spreng, Rachel L.
    Seaton, Kelly E.
    Moses Dennison, S.
    Dahora, Lindsay C.
    Schuster, Daniel J.
    Sawant, Sheetal
    Gilbert, Peter B.
    Fong, Youyi
    Kisalu, Neville
    Pollard, Andrew J.
    Tomaras, Georgia D.
    Li, Jia
    PLoS Computational Biology, 2024, 20 (11 November)
  • [44] Polarimetric image denoising on small datasets using deep transfer learning
    Hu, Haofeng
    Jin, Huifeng
    Liu, Hedong
    Li, Xiaobo
    Cheng, Zhenzhou
    Liu, Tiegen
    Zhai, Jingsheng
    OPTICS AND LASER TECHNOLOGY, 2023, 166
  • [45] Experiment on Handwriting Generation with Recurrent Neural Networks using Small Datasets
    Liu, Yushun
    Liu, Liguo
    Miao, Xuhui
    2021 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, AIVR 2021, 2021, : 123 - 127
  • [46] RECOGNITION OF THE CONDITION OF CONSTRUCTION MATERIALS USING SMALL DATASETS AND HANDCRAFTED FEATURES
    Mengiste, Eyob
    de Soto, Borja Garcia
    Hartmann, Timo
    JOURNAL OF INFORMATION TECHNOLOGY IN CONSTRUCTION, 2022, 27 : 951 - 971
  • [47] A comparison of supermatrix and supertree methods for multilocus phylogenetics using organismal datasets
    Janies, Daniel A.
    Studer, Jonathon
    Handelman, Samuel K.
    Linchangco, Gregorio
    CLADISTICS, 2013, 29 (05) : 560 - 566
  • [48] A Predictive Application Offloading Algorithm Using Small Datasets for Cloud Robotics
    Penmetcha, Manoj
    Kannan, Shyam Sundar
    Min, Byung-Cheol
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 132 - 139
  • [49] Teaching Mixed Methods: Using the Titanic Datasets to Teach Mixed Methods Data Analysis
    Lindemann, Anaid
    Stolz, Joerg
    METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES, 2021, 17 (03) : 231 - 249
  • [50] The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets
    Stocks, Matthew B.
    Moxon, Simon
    Mapleson, Daniel
    Woolfenden, Hugh C.
    Mohorianu, Irina
    Folkes, Leighton
    Schwach, Frank
    Dalmay, Tamas
    Moulton, Vincent
    BIOINFORMATICS, 2012, 28 (15) : 2059 - 2061