Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets

被引：20

作者：

Marston, Louise ^{[1
]}

Peacock, Janet L. ^{[6
]}

Yu, Keming ^{[3
]}

Brocklehurst, Peter ^{[7
]}

Calvert, Sandra A. ^{[4
]}

Greenough, Anne ^{[5
]}

Marlow, Neil ^{[2
]}

机构：

[1] Brunel Univ, Dept Primary Care & Populat Hlth, Uxbridge UB8 3PH, Middx, England

[2] Brunel Univ, Inst Womens Hlth, UCL, Uxbridge UB8 3PH, Middx, England

[3] Brunel Univ, Sch Informat Syst Comp & Math, Uxbridge UB8 3PH, Middx, England

[4] Univ London, Dept Child Hlth, London WC1E 7HU, England

[5] Kings Coll London, Div Asthma Allergy & Lung Biol, Sch Med, London WC2R 2LS, England

[6] Univ Southampton, Dept Publ Hlth Sci & Med Stat, Southampton, Hants, England

[7] Univ Oxford, Natl Perinatal Epidemiol Unit, Oxford, England

来源：

PAEDIATRIC AND PERINATAL EPIDEMIOLOGY | 2009年 / 23卷 / 04期

关键词：

multiple births; statistical methodology; multilevel model; generalised estimating equations; multiple linear regression; cluster; LONGITUDINAL DATA-ANALYSIS; RANDOMIZED-TRIALS; REGRESSION-MODELS; BINARY DATA; QUADRATURE; EXAMPLE; TWIN;

D O I：

10.1111/j.1365-3016.2009.01046.x

中图分类号：

R1 [预防医学、卫生学];

学科分类号：

1004 ; 120402 ;

摘要：

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

引用

页码：380 / 392

页数：13

共 50 条

[31] Process modeling with neural networks using small experimental datasets
Lanouette, R
Thibault, J
Valade, JL
COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 (09) : 1167 - 1176
[32] Process modeling with neural networks using small experimental datasets
Ctr. de Rech. en Pates et Papiers, Univ. Quebec Trois-Rivieres, P.O. B., Trois-Rivières, Canada
不详
Comput. Chem. Eng., 9 (1167-1176):
[33] THE INCREASING AVAILABILITY OF OFFICIAL DATASETS: METHODS, LIMITATIONS AND OPPORTUNITIES FOR STUDIES OF EDUCATION
Gorard, Stephen
BRITISH JOURNAL OF EDUCATIONAL STUDIES, 2012, 60 (01) : 77 - 92
[34] Effectiveness of predicting tunneling-induced ground settlements using machine learning methods with small datasets
Liu, Linan
Zhou, Wendy
Gutierrez, Marte
JOURNAL OF ROCK MECHANICS AND GEOTECHNICAL ENGINEERING, 2022, 14 (04) : 1028 - 1041
[35] Effectiveness of predicting tunneling-induced ground settlements using machine learning methods with small datasets
Linan Liu
Wendy Zhou
Marte Gutierrez
Journal of Rock Mechanics and Geotechnical Engineering, 2022, 14 (04) : 1028 - 1041
[36] Transfer Learning Methods as a New Approach in Computer Vision Tasks with Small Datasets
Brodzicki, Andrzej
Piekarski, Michal
Kucharski, Dariusz
Jaworek-Korjakowska, Joanna
Gorgon, Marek
FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2020, 45 (03) : 179 - 193
[37] Performance Evaluation of Text Augmentation Methods with BERT on Small -sized, Imbalanced Datasets
Hu, Lingshu
Li, Can
Wang, Wenbo
Pang, Bin
Shang, Yi
2022 IEEE 4TH INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE, COGMI, 2022, : 125 - 133
[38] Data optimisation and partitioning in private cloud using dynamic clusters for agricultural datasets
Leena H.U.
Premasudha B.G.
Basavaraja P.K.
Leena, H.U. (leenahu@sit.ac.in), 2020, Springer Science and Business Media Deutschland GmbH (08) : 1027 - 1039
[39] Managing large multidimensional array hydrologic datasets: a case study comparing NetCDF and SciDB
Liu, Haicheng
van Oosterom, Peter
Hu, Chengfang
Wang, Wen
12TH INTERNATIONAL CONFERENCE ON HYDROINFORMATICS (HIC 2016) - SMART WATER FOR THE FUTURE, 2016, 154 : 207 - 214
[40] Analytical methods for studying the evolution of paralogs using duplicate gene datasets
Mathews, S
MOLECULAR EVOLUTION: PRODUCING THE BIOCHEMICAL DATA, PART B, 2005, 395 : 724 - 745

← 1 2 3 4 5 →