Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:9
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [31] Analysis of non-ignorable missing and left-censored longitudinal data using a weighted random effects tobit model
    Sattar, Abdus
    Weissfeld, Lisa A.
    Molenberghs, Geert
    STATISTICS IN MEDICINE, 2011, 30 (27) : 3167 - 3180
  • [32] Multiple imputation of censored survival data in the presence of missing covariates using restricted mean survival time
    Grover, Gurprit
    Gupta, Vinay K.
    JOURNAL OF APPLIED STATISTICS, 2015, 42 (04) : 817 - 827
  • [33] IMPROVING BAYESIAN MIXTURE MODELS FOR MULTIPLE IMPUTATION OF MISSING DATA USING FOCUSED CLUSTERING
    Wei, Lan
    Reiter, Jerome P.
    REVSTAT-STATISTICAL JOURNAL, 2018, 16 (02) : 213 - 230
  • [34] Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study
    Matthew Sperrin
    Glen P. Martin
    BMC Medical Research Methodology, 20
  • [35] A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits
    Lee, MinJae
    Rahbar, Mohammad H.
    Brown, Matthew
    Gensler, Lianne
    Weisman, Michael
    Diekman, Laura
    Reveille, John D.
    BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [36] Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study
    Sperrin, Matthew
    Martin, Glen P.
    BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [37] Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering
    Jing Tian
    Bing Yu
    Dan Yu
    Shilong Ma
    Applied Intelligence, 2014, 40 : 376 - 388
  • [38] A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits
    MinJae Lee
    Mohammad H. Rahbar
    Matthew Brown
    Lianne Gensler
    Michael Weisman
    Laura Diekman
    John D. Reveille
    BMC Medical Research Methodology, 18
  • [39] Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering
    Tian, Jing
    Yu, Bing
    Yu, Dan
    Ma, Shilong
    APPLIED INTELLIGENCE, 2014, 40 (02) : 376 - 388
  • [40] Imputation of Missing Data Using Fuzzy Neighborhood Density-Based Clustering
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1834 - 1841