Issues in multiple imputation of missing data for large general practice clinical databases

被引:83
|
作者
Marston, Louise [1 ]
Carpenter, James R. [2 ]
Walters, Kate R. [1 ]
Morris, Richard W. [1 ]
Nazareth, Irwin [1 ,3 ]
Petersen, Irene [1 ]
机构
[1] UCL, Dept Primary Care & Populat Hlth, London NW3 2PF, England
[2] London Sch Hyg & Trop Med, Med Stat Unit, London WC1E 7HT, England
[3] MRC Gen Practice Res Framework, London NW1 2ND, England
基金
英国经济与社会研究理事会; 英国医学研究理事会;
关键词
clinical databases; missing data; multiple imputation; primary care databases; CORONARY-HEART-DISEASE; MYOCARDIAL-INFARCTION; BLOOD-PRESSURE; MORTALITY; RISK; METAANALYSIS; SMOKING; COHORT; ADULTS; WOMEN;
D O I
10.1002/pds.1934
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose Missing data are a substantial problem in clinical databases. This paper aims to examine patterns of missing data in a primary care database, compare this to nationally representative datasets and explore the use of multiple imputation (MI) for these data. Methods The patterns and extent of missing health indicators in a UK primary care database (THIN) were quantified using 488 384 patients aged 16 or over in their first year after registration with a GP from 354 General Practices. MI models were developed and the resulting data compared to that from nationally representative datasets (14 142 participants aged 16 or over from the Health Survey for England 2006 (HSE) and 4 252 men from the British Regional Heart Study (BRHS)). Results Between 22% (smoking) and 38% (height) of health indicator data were missing in newly registered patients, 2004-2006. Distributions of height, weight and blood pressure were comparable to HSE and BRHS, but alcohol and smoking were not. After MI the percentage of smokers and non-drinkers was higher in THIN than the comparison datasets, while the percentage of ex-smokers and heavy drinkers was lower. Height, weight and blood pressure remained similar to the comparison datasets. Conclusions Given available data, the results are consistent with smoking and alcohol data missing not at random whereas height, weight and blood pressure missing at random. Further research is required on suitable imputation methods for smoking and alcohol in such databases. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:618 / 626
页数:9
相关论文
共 50 条
  • [1] Multiple Imputation for Missing Data in Electronic Health Databases: Practical Issues and Some Solutions
    Carpenter, James
    Petersen, Irene
    Welch, Catherine
    Bartlett, Jonathan
    Walters, Kate
    Morris, Richard
    White, Ian
    Marston, Louise
    Nazareth, Irwin
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2011, 20 : S163 - S163
  • [2] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    [J]. APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [3] Imputation of Missing Data in Industrial Databases
    Kamakshi Lakshminarayan
    Steven A. Harp
    Tariq Samad
    [J]. Applied Intelligence, 1999, 11 : 259 - 275
  • [4] Missing Data in Clinical Research: A Tutorial on Multiple Imputation
    Austin, Peter C.
    White, Ian R.
    Lee, Douglas S.
    van Buuren, Stef
    [J]. CANADIAN JOURNAL OF CARDIOLOGY, 2021, 37 (09) : 1322 - 1331
  • [5] Missing data and multiple imputation in clinical epidemiological research
    Pedersen, Alma B.
    Mikkelsen, Ellen M.
    Cronin-Fenton, Deirdre
    Kristensen, Nickolaj R.
    Tra My Pham
    Pedersen, Lars
    Petersen, Irene
    [J]. CLINICAL EPIDEMIOLOGY, 2017, 9 : 157 - 165
  • [6] Missing data imputation and corrected statistics for large-scale behavioral databases
    Pierre Courrieu
    Arnaud Rey
    [J]. Behavior Research Methods, 2011, 43 : 310 - 330
  • [7] Missing data imputation and corrected statistics for large-scale behavioral databases
    Courrieu, Pierre
    Rey, Arnaud
    [J]. BEHAVIOR RESEARCH METHODS, 2011, 43 (02) : 310 - 330
  • [8] Missing Data and Multiple Imputation
    Cummings, Peter
    [J]. JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [9] Multiple imputation for missing data
    Patrician, PA
    [J]. RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84
  • [10] Multiple imputation of missing data
    Lydersen, Stian
    [J]. TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151