Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud

被引:15
|
作者
Bu, Fanyu [1 ,2 ]
Chen, Zhikui [1 ]
Zhang, Qingchen [1 ]
Yang, Laurence T. [3 ]
机构
[1] Dalian Univ Technol, Sch Software Technol, Dalian 116620, Peoples R China
[2] Inner Mongolia Univ Finance & Econ, Coll Vocat, Hohhot 010010, Peoples R China
[3] St Francis Xavier Univ, Dept Comp Sci, Antigonish, NS B2G 2W5, Canada
来源
JOURNAL OF SUPERCOMPUTING | 2016年 / 72卷 / 08期
关键词
High-dimensional data; Incomplete data imputation; Feature subset selection; Clustering analysis; SUPPORT VECTOR REGRESSION; C-MEANS; VALUES;
D O I
10.1007/s11227-015-1433-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Incomplete data imputation plays an important role in big data analysis and smart computing. Existing algorithms are of low efficiency and effectiveness in imputing incomplete high-dimensional data. The paper proposes an incomplete high-dimensional data imputation algorithm based on feature selection and cluster analysis (IHDIFC), which works in three steps. First, a hierarchical clustering-based feature subset selection algorithm is designed to reduce the dimensions of the data set. Second, a parallel -means algorithm based on partial distance is derived to cluster the selected data subset efficiently. Finally, the data objects in the same cluster with the target are utilized to estimate its missing feature values. Extensive experiments are carried out to compare IHDIFC to two representative missing data imputation algorithms, namely FIMUS and DMI. The results demonstrate that the proposed algorithm achieves better imputation accuracy and takes significantly less time than other algorithms for imputing high-dimensional data.
引用
收藏
页码:2977 / 2990
页数:14
相关论文
共 50 条
  • [1] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [2] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [3] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [4] Multiple imputation and analysis for high-dimensional incomplete proteomics data
    Yin, Xiaoyan
    Levy, Daniel
    Willinger, Christine
    Adourian, Aram
    Larson, Martin G.
    STATISTICS IN MEDICINE, 2016, 35 (08) : 1315 - 1326
  • [5] Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data
    Deng, Wan-Yu
    Liu, Dan
    Dong, Ying-Ying
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [6] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [7] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [8] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [9] On online high-dimensional spherical data clustering and feature selection
    Amayri, Ola
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
  • [10] A Clustering Algorithm for High-Dimensional Nonlinear Feature Data with Applications
    Jiang H.
    Wang G.
    Gao J.
    Gao Z.
    Gao R.
    Guo Q.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (12): : 49 - 55and90