Missing value imputation using a fuzzy clustering-based EM approach

被引:48
|
作者
Rahman, Md. Geaur [1 ]
Islam, Md Zahidul [1 ]
机构
[1] Charles Sturt Univ, Sch Comp & Math, Ctr Res Complex Syst CRiCS, Bathurst, NSW 2795, Australia
关键词
Data preprocessing; Data cleansing; Data quality; Missing value imputation; Fuzzy clustering; GENE-EXPRESSION DATA; OUTLIER DETECTION; SELECTION; ALGORITHMS; FCM;
D O I
10.1007/s10115-015-0822-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.
引用
收藏
页码:389 / 422
页数:34
相关论文
共 50 条
  • [1] Missing value imputation using a fuzzy clustering-based EM approach
    Md. Geaur Rahman
    Md Zahidul Islam
    [J]. Knowledge and Information Systems, 2016, 46 : 389 - 422
  • [2] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714
  • [3] A Novel Fuzzy Rough Clustering Parameter-based missing value imputation
    P. S. Raja
    K. Sasirekha
    K. Thangavel
    [J]. Neural Computing and Applications, 2020, 32 : 10033 - 10050
  • [4] A Novel Fuzzy Rough Clustering Parameter-based missing value imputation
    Raja, P. S.
    Sasirekha, K.
    Thangavel, K.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 10033 - 10050
  • [5] Soft Clustering Based Missing Value Imputation
    Raja, P. S.
    Thangavel, K.
    [J]. DIGITAL CONNECTIVITY - SOCIAL IMPACT, 2016, 679 : 119 - 133
  • [6] Imputation of Missing Data Using Fuzzy Neighborhood Density-Based Clustering
    Razavi-Far, Roozbeh
    Saif, Mehrdad
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1834 - 1841
  • [7] Fuzzy Clustering-Based Approach for Outlier Detection
    Al-Zoubi, Moh'd Belal
    Ali, Al-Dahoud
    Yahya, Abdelfatah A.
    [J]. RECENT ADVANCES AND APPLICATIONS OF COMPUTER ENGINEERING: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE (ACE 10), 2010, : 192 - +
  • [8] A novel clustering-based purity and distance imputation for handling medical data with missing values
    Ching-Hsue Cheng
    Shu-Fen Huang
    [J]. Soft Computing, 2021, 25 : 11781 - 11801
  • [9] A novel clustering-based purity and distance imputation for handling medical data with missing values
    Cheng, Ching-Hsue
    Huang, Shu-Fen
    [J]. SOFT COMPUTING, 2021, 25 (17) : 11781 - 11801
  • [10] A class center based approach for missing value imputation
    Tsai, Chih-Fong
    Li, Miao-Ling
    Lin, Wei-Chao
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 151 : 124 - 135