Missing value imputation using a fuzzy clustering-based EM approach

被引:48
|
作者
Rahman, Md. Geaur [1 ]
Islam, Md Zahidul [1 ]
机构
[1] Charles Sturt Univ, Sch Comp & Math, Ctr Res Complex Syst CRiCS, Bathurst, NSW 2795, Australia
关键词
Data preprocessing; Data cleansing; Data quality; Missing value imputation; Fuzzy clustering; GENE-EXPRESSION DATA; OUTLIER DETECTION; SELECTION; ALGORITHMS; FCM;
D O I
10.1007/s10115-015-0822-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data preprocessing and cleansing play a vital role in data mining by ensuring good quality of data. Data-cleansing tasks include imputation of missing values, identification of outliers, and identification and correction of noisy data. In this paper, we present a novel technique called A Fuzzy Expectation Maximization and Fuzzy Clustering-based Missing Value Imputation Framework for Data Pre-processing (FEMI). It imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. While identifying a group of similar records and making a guess based on the group, it applies a fuzzy clustering approach and our novel fuzzy expectation maximization algorithm. We evaluate FEMI on eight publicly available natural data sets by comparing its performance with the performance of five high-quality existing techniques, namely EMI, GkNN, FKMI, SVR and IBLLS. We use thirty-two types (patterns) of missing values for each data set. Two evaluation criteria namely root mean squared error and mean absolute error are used. Our experimental results indicate (according to a confidence interval and test analysis) that FEMI performs significantly better than EMI, GkNN, FKMI, SVR, and IBLLS.
引用
收藏
页码:389 / 422
页数:34
相关论文
共 50 条
  • [11] Towards efficient imputation by nearest-neighbors: A clustering-based approach
    Hruschka, ER
    Hruschka, ER
    Ebecken, NFF
    [J]. AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 513 - 525
  • [12] Optimized fuzzy clustering-based k-nearest neighbors imputation for mixed missing data in software development effort estimation
    Abnane, Ibtissam
    Idri, Ali
    Abran, Alain
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2024, 36 (04)
  • [13] A Fuzzy Clustering-based Approach to study Malware Phylogeny
    Acampora, Giovanni
    Bernardi, Mario Luca
    Cimitile, Marta
    Tortora, Genoveffa
    Vitiello, Autilia
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [14] EACImpute: An Evolutionary Algorithm for Clustering-Based Imputation
    Silva, Jonathan de Andrade
    Hruschka, Eduardo R.
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 1400 - 1406
  • [15] Missing data imputation using decision trees and fuzzy clustering with iterative learning
    Sanaz Nikfalazar
    Chung-Hsing Yeh
    Susan Bedingfield
    Hadi A. Khorshidi
    [J]. Knowledge and Information Systems, 2020, 62 : 2419 - 2437
  • [16] Missing data imputation using decision trees and fuzzy clustering with iterative learning
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (06) : 2419 - 2437
  • [17] A clustering-based fuzzy classifier
    Drummond, Isabela
    Sandri, Sandra
    [J]. ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2005, 131 : 247 - 254
  • [18] Fuzzy Clustering-Based Filter
    Coletta, Luiz F. S.
    Hruschka, Eduardo R.
    Covoes, Thiago F.
    Campello, Ricardo J. G. B.
    [J]. INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND METHODS, PT 1, 2010, 80 : 406 - 415
  • [19] A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation
    Albayrak, Muammer
    Turhan, Kemal
    Kurt, Burcin
    [J]. 2017 MEDICAL TECHNOLOGIES NATIONAL CONGRESS (TIPTEKNO), 2017,
  • [20] Missing value imputation method based on density clustering and grey relational analysis
    Peng, Li
    Ting-Ting, Zhang
    Tian-Ge, Liang
    Kai-Hui, Zhang
    [J]. International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 133 - 142