COLI: Collaborative clustering missing data imputation

被引:7
|
作者
Wan, Daoming [1 ]
Razavi-Far, Roozbeh [1 ,2 ]
Saif, Mehrdad [1 ]
Mozafari, Niloofar [3 ]
机构
[1] Univ Windsor, Dept Elect & Comp Engn, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada
[2] Univ Windsor, Sch Comp Sci, 401 Sunset Ave, Windsor, ON N9B 3P4, Canada
[3] Reg Informat Ctr Sci & Technol RICeST, Dept Design & Syst Operat, Shiraz, Iran
关键词
Missing data imputation; Collaborative clustering; Data amputation;
D O I
10.1016/j.patrec.2021.11.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data imputation plays an important role in the data cleansing process. Clustering algorithms have been widely used for missing data imputation, yet, there is little research done on the use of clustering ensemble for missing data imputation, which aggregates multiple clustering results. This paper proposes a novel collaborative clustering-based imputation method, called COLI, which uses the imputation quality as a key criterion for the exchange of information between different clustering results. To the best of our knowledge, this is the first study on the impact of collaborative clustering on imputation performance. The main contributions of this paper are three-fold. A novel missing value imputation based on collaborative clustering is proposed, three amputation strategies are used to induce missingness on various complete and publicly available datasets with different mechanisms, distributions, and ratios, which allows evaluating the imputation quality of the proposed method in estimating missing values of various numerical datasets with different missingness mechanisms, distributions, and ratios. The proposed method is compared to several state-of-the-art imputation methods and attained results demonstrate that the proposed method is an effective method for handling missing data. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:420 / 427
页数:8
相关论文
共 50 条
  • [1] Cooperative Clustering Missing Data Imputation
    Wan, Daoming
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1039 - 1045
  • [2] Imputation Method Based on Collaborative Filtering and Clustering for the Missing Data of the Squeeze Casting Process Parameters
    Deng, Jianxin
    Ye, Zhixing
    Shan, Lubao
    You, Dongdong
    Liu, Guangming
    [J]. INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2022, 11 (01) : 95 - 108
  • [3] Imputation Method Based on Collaborative Filtering and Clustering for the Missing Data of the Squeeze Casting Process Parameters
    Jianxin Deng
    Zhixing Ye
    Lubao Shan
    Dongdong You
    Guangming Liu
    [J]. Integrating Materials and Manufacturing Innovation, 2022, 11 : 95 - 108
  • [4] CollaGAN: Collaborative GAN for Missing Image Data Imputation
    Lee, Dongwook
    Kim, Junyoung
    Moon, Won-Jin
    Ye, Jong Chul
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2482 - 2491
  • [5] Instance driven clustering for the imputation of missing data in KDD
    Ilango, P.
    Vijayakumar, K.
    Babu, M. Rajasekhara
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2014, 12 (01) : 69 - 81
  • [6] Imputation method for missing data based on clustering and measure of property
    Kim, Sunghyun
    Kim, Dongjae
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2018, 31 (01) : 29 - 40
  • [7] Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering
    Gao, Hang
    Shen, Wenjun
    Li, Rui
    Liu, Cheng
    Wu, Si
    [J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2024, 21 (05) : 1480 - 1491
  • [8] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [9] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [10] Impact of missing data imputation methods on gene expression clustering and classification
    de Souto, Marcilio C. P.
    Jaskowiak, Pablo A.
    Costa, Ivan G.
    [J]. BMC BIOINFORMATICS, 2015, 16