Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:8
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [1] A Two-Step Multiple Imputation for Analysis of Repeated Measures With Left-Censored and Missing Data
    Liu, G. Frank
    Hu, Peter
    Mehrotra, Devan V.
    [J]. STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2013, 5 (02): : 116 - 125
  • [2] Multiple imputation for left-censored biomarker data based on Gibbs sampling method
    Lee, MinJae
    Kong, Lan
    Weissfeld, Lisa
    [J]. STATISTICS IN MEDICINE, 2012, 31 (17) : 1838 - 1848
  • [3] A Multiple Imputation Approach for Estimating Rank Correlation With Left-Censored Data
    Williamson, John M.
    Crawford, Sara B.
    Lin, Hung-Mo
    [J]. STATISTICS IN BIOPHARMACEUTICAL RESEARCH, 2010, 2 (04): : 540 - 548
  • [4] Imputation of left-censored data for cluster analysis
    Liu, Yushan
    Brown, Steven D.
    [J]. JOURNAL OF CHEMOMETRICS, 2014, 28 (03) : 148 - 160
  • [5] Assessing assay agreement estimation for multiple left-censored data: a multiple imputation approach
    Lapidus, Nathanael
    Chevret, Sylvie
    Resche-Rigon, Matthieu
    [J]. STATISTICS IN MEDICINE, 2014, 33 (30) : 5298 - 5309
  • [6] GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
    Wei, Runmin
    Wang, Jingye
    Jia, Erik
    Chen, Tianlu
    Ni, Yan
    Jia, Wei
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (01)
  • [7] Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes
    Efromovich, Sam
    Fuksman, Lirit
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 198
  • [8] zCompositions - R Package for multivariate imputation of left-censored data under a compositional approach
    Palarea-Albaladejo, Javier
    Antoni Martin-Fernandez, Josep
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 143 : 85 - 96
  • [9] Methods for left-censored biomarker data: a simulation study in the two-sample case
    Thiele, Dominik
    Koenig, Inke R.
    [J]. GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 735 - 736
  • [10] Assessment of left-censored data treatment methods using stochastic simulation
    da Silva, Fabio Henrique Rodrigues
    Pinto, Eber Jose de Andrade
    [J]. RBRH-REVISTA BRASILEIRA DE RECURSOS HIDRICOS, 2023, 28