An Improved Mean Imputation Clustering Algorithm for Incomplete Data

被引:0
|
作者
Hong Shi
Pingxin Wang
Xin Yang
Hualong Yu
机构
[1] Jiangsu University of Science and Technology,School of Computer Science
[2] Jiangsu University of Science and Technology,School of Science
[3] Hebei Normal University,College of Mathematics and Information Science
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Incomplete data; Mean imputation; K-means; Validity index;
D O I
暂无
中图分类号
学科分类号
摘要
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute’s value of each cluster to fill the missing attribute’s value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.
引用
收藏
页码:3537 / 3550
页数:13
相关论文
共 50 条
  • [21] Incomplete big data imputation mining algorithm based on BP neural network
    Liu, Yutang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 4457 - 4466
  • [22] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [23] A Hybrid Method for Incomplete Data Imputation
    Zhao, Liang
    Chen, Zhikui
    Yang, Zhennan
    Hu, Yueming
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1725 - 1730
  • [24] A Pseudo-EM Algorithm for Clustering Incomplete Longitudinal Data
    Shaikh, Mateen
    McNicholas, Paul D.
    Desmond, Anthony F.
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2010, 6 (01):
  • [25] An improved incomplete AP clustering algorithm based on K nearest neighbours
    Chen, Zhikui
    Leng, Yonglin
    Hu, Yueming
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2019, 11 (03) : 269 - 277
  • [26] Greedy Algorithm for Subspace Clustering from Corrupted and Incomplete Data
    Petukhov, Alexander
    Kozlov, Inna
    2015 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2015, : 458 - 462
  • [27] A Three-Way Decisions Clustering Algorithm for Incomplete Data
    Yu, Hong
    Su, Ting
    Zeng, Xianhua
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 765 - 776
  • [28] Power Incomplete Data Clustering Based on Fuzzy Fusion Algorithm
    Hong Y.
    Yan Y.
    Energy Engineering: Journal of the Association of Energy Engineering, 2023, 120 (01): : 245 - 261
  • [29] A Novel Mean-Shift Algorithm for Data Clustering
    Cariou, Claude
    Le Moan, Steven
    Chehdi, Kacem
    IEEE ACCESS, 2022, 10 : 14575 - 14585
  • [30] Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values
    Schneider, T
    JOURNAL OF CLIMATE, 2001, 14 (05) : 853 - 871