An Improved Mean Imputation Clustering Algorithm for Incomplete Data

被引:0
|
作者
Hong Shi
Pingxin Wang
Xin Yang
Hualong Yu
机构
[1] Jiangsu University of Science and Technology,School of Computer Science
[2] Jiangsu University of Science and Technology,School of Science
[3] Hebei Normal University,College of Mathematics and Information Science
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Incomplete data; Mean imputation; K-means; Validity index;
D O I
暂无
中图分类号
学科分类号
摘要
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute’s value of each cluster to fill the missing attribute’s value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.
引用
收藏
页码:3537 / 3550
页数:13
相关论文
共 50 条
  • [1] An Improved Mean Imputation Clustering Algorithm for Incomplete Data
    Shi, Hong
    Wang, Pingxin
    Yang, Xin
    Yu, Hualong
    NEURAL PROCESSING LETTERS, 2022, 54 (05) : 3537 - 3550
  • [2] An Imputation-Based Method for Fuzzy Clustering of Incomplete Data
    Soni, S.
    Sharma, I.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 616 - 621
  • [3] Towards clustering of incomplete microarray data without the use of imputation
    Kim, Dae-Won
    Lee, Ki-Young
    Lee, Kwang H.
    Lee, Doheon
    BIOINFORMATICS, 2007, 23 (01) : 107 - 113
  • [4] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Bu, Fanyu
    Chen, Zhikui
    Zhang, Qingchen
    Yang, Laurence T.
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (08): : 2977 - 2990
  • [5] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [6] Multiple imputation confidence intervals for the mean of the discrete distributions for incomplete data
    Lee, Chung-Han
    Wang, Hsiuying
    STATISTICS IN MEDICINE, 2022, 41 (07) : 1172 - 1190
  • [7] An improved imputation method for incomplete GxE trial data for asparagus
    Nichols, MA
    Godfrey, AR
    Wood, GJR
    Qiao, CG
    Ganesalingam, S
    PROCEEDINGS OF THE TENTH INTERNATIONAL ASPARAGUS SYMPOSIUM, 2002, (589): : 111 - 116
  • [8] Extended mean field annealing for clustering incomplete data
    Wu, Jun
    Song, Chi-Hwa
    Kong, Jung Min
    Lee, Won Don
    2007 INTERNATIONAL SYMPOSIUM ON INFORMATION TECHNOLOGY CONVERGENCE, PROCEEDINGS, 2007, : 8 - 12
  • [9] Fuzzy Clustering and Nonlinear Regression Imputation for Incomplete Data of Tunnel Boring Machine
    Wang Y.
    Pang Y.
    Zhang L.
    Shi Y.
    Sun W.
    Song X.
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2023, 59 (12): : 28 - 37
  • [10] Incomplete clustering analysis via multiple imputation
    Lee, Jung Wun
    Harel, Ofer
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (09) : 1962 - 1979