Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures

被引:9
|
作者
Faucheux, Lilith [1 ,2 ]
Resche-Rigon, Matthieu [1 ,3 ]
Curis, Emmanuel [3 ,4 ]
Soumelis, Vassili [2 ,5 ]
Chevret, Sylvie [1 ,3 ]
机构
[1] Univ Paris, Sorbonne Paris Cite, ECSTRRA Team, INSERM UMR1153, Paris, France
[2] Univ Paris, Sorbonne Paris Cite, INSERM U976, Paris, France
[3] Hop St Louis, AP HP, Serv Biostat & Informat Med, Paris, France
[4] Univ Paris, Sorbonne Paris Cite, Lab Biomath Plateau IB2 EA 7537 BioSTM, Fac Pharm, Paris, France
[5] Hop St Louis, AP HP, Lab Immunol Biol & Histocompatibil, Paris, France
关键词
breast cancer; clustering; consensus; left-censored data; missing data; multiple imputation; LIMIT; QUANTIFICATION; INFERENCE; IMPACT;
D O I
10.1002/bimj.201900366
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left-censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete-case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left-censoring and do not allow the number of clusters to vary across the partitions of each imputed dataset. Here, we developed a consensus-based clustering algorithm in which left-censored data are taken into account using a modified multiple imputation method and the number of clusters is estimated for each imputed dataset. A simulation study was conducted to assess the performance in terms of the number of clusters, the percentage of unclassified observations, and the adjusted Rand index. The simulation results showed that the investigated method works well compared to several alternative approaches. A real-world application in breast cancer patients showed that the proposed method may reveal novel clusters of patients.
引用
收藏
页码:372 / 393
页数:22
相关论文
共 50 条
  • [41] Multiple imputation for simple estimation of the hazard function based on interval censored data
    Bebchuk, JD
    Betensky, RA
    STATISTICS IN MEDICINE, 2000, 19 (03) : 405 - 419
  • [42] Comparing and combining data from immune assays based on left-censored multivariate normal model assuming common assay differences across settings
    Huang, Ying
    Huang, Yunda
    STATISTICS IN MEDICINE, 2023, 42 (02) : 164 - 177
  • [43] Latent class based multiple imputation approach for missing categorical data
    Gebregziabher, Mulugeta
    DeSantis, Stacia M.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3252 - 3262
  • [44] Multiple imputation for survey data that are missing by design: A validation study.
    Yost, K
    Levine, R
    Gold, E
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2003, 157 (11) : S34 - S34
  • [45] Missing data treatments matter: an analysis of multiple imputation for anterior cervical discectomy and fusion procedures
    Ondeck, Nathaniel T.
    Fu, Michael C.
    Skrip, Laura A.
    McLynn, Ryan P.
    Cui, Jonathan J.
    Basques, Bryce A.
    Albert, Todd J.
    Grauer, Jonathan N.
    SPINE JOURNAL, 2018, 18 (11): : 2009 - 2017
  • [46] Comparing Single and Multiple Imputation Approaches for Missing Values in Univariate and Multivariate Water Level Data
    Umar, Nura
    Gray, Alison
    WATER, 2023, 15 (08)
  • [47] Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field
    Tian, Jing
    Yu, Bing
    Yu, Dan
    Ma, Shilong
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [48] Confidence Intervals for Mean and Difference between Means of Delta-Lognormal Distributions Based on Left-Censored Data
    Thangjai, Warisa
    Niwitpong, Sa-Aat
    SYMMETRY-BASEL, 2023, 15 (06):
  • [49] Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data
    Ehrig, Molly
    Bullock, Garrett S.
    Leng, Xiaoyan Iris
    Pajewski, Nicholas M.
    Speiser, Jaime Lynn
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [50] Incomplete data modeling based on alternate update of clustering and autoencoder for missing value imputation
    Xiaochen Lai
    Zheng Zhang
    Liyong Zhang
    Wei Lu
    ZhuoHan Li
    Neural Computing and Applications, 2025, 37 (3) : 1523 - 1540