Missing data imputation with fuzzy feature selection for diabetes dataset

被引:25
|
作者
Dzulkalnine, Mohamad Faiz [1 ]
Sallehuddin, Roselina [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Skudai 81300, Johor, Malaysia
来源
SN APPLIED SCIENCES | 2019年 / 1卷 / 04期
关键词
Missing data; Fuzzy feature selection; Imputation; Classification; SUPPORT VECTOR MACHINES; ALGORITHMS; MODEL;
D O I
10.1007/s42452-019-0383-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data in datasets remain as a difficulty in terms of data analysis in various research fields, especially in the medical field, as it affects the treatment and diagnosis that the patient should receive. In this research, Fuzzy c-means (FCM) are used to impute the missing data. However, like in most data imputation methods, FCM do not consider the presence of irrelevant features. Irrelevant features can increase the computational time of the imputation process and decrease the accuracy of the prediction. Feature selection techniques can alleviate this problem by selecting the most relevant features and reducing the dataset size. Fuzzy principal component analysis (FPCA) is used as the feature selection method in this study as it considers the presence of outliers compared to classical PCA as outliers are the main reason some features renders irrelevant. Therefore, an improved hybrid imputation model of FPCA-Support vector machines-FCM (FPCA-SVM-FCM) has been proposed and employed in this study. The efficiency of the proposed model is investigated on one dataset which is Pima Indians Diabetes dataset. Experimental results showed that the proposed hybrid imputation model is better than the existing methods by producing a more accurate estimation in terms of accuracy, RMSE and MAE. The proposed method was also validated by using Wilcoxon rank sum and Theirs U test and obtained good results compared to SVM-FCM. Therefore, it can be used as an alternative tool for handling missing data in order to obtain a better quality dataset.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Missing data imputation with fuzzy feature selection for diabetes dataset
    Mohamad Faiz Dzulkalnine
    Roselina Sallehuddin
    SN Applied Sciences, 2019, 1
  • [2] Fuzzy rough assisted missing value imputation and feature selection
    Jain, Pankhuri
    Tiwari, Anoop
    Som, Tanmoy
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (03): : 2773 - 2793
  • [3] Fuzzy rough assisted missing value imputation and feature selection
    Pankhuri Jain
    Anoop Tiwari
    Tanmoy Som
    Neural Computing and Applications, 2023, 35 : 2773 - 2793
  • [4] Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse
    Mingxuan FAN
    Xiaoling Peng
    Xiaoyu Niu
    Tao Cui
    Qiaolin He
    BMC Medical Research Methodology, 23
  • [5] Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse
    Fan, Mingxuan
    Peng, Xiaoling
    Niu, Xiaoyu
    Cui, Tao
    He, Qiaolin
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [6] Treating missing data in a clinical neuropsychological dataset -: Data imputation
    Närhi, V
    Laaksonen, S
    Hietala, R
    Ahonen, T
    Lyyti, H
    CLINICAL NEUROPSYCHOLOGIST, 2001, 15 (03): : 380 - 392
  • [7] Evaluating Imputation Methods for Missing Data in a MCI Dataset
    Gomez-Valades Batanero, Alba
    Rincon Zamorano, Mariano
    Martinez Tomas, Rafael
    Guerrero Martin, Juan
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 446 - 454
  • [8] Iterative Fuzzy C Means, Fuzzy Silhouette, and Imputation for Missing Values in a Dataset
    Mausor, Farahida Hanim
    Jaafar, Jafreezal
    Taib, Shakirah Mohd
    Razali, Razulaimi
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 382 - 385
  • [9] EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning
    Awawdeh, Shatha
    Faris, Hossam
    Hiary, Hazem
    KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [10] The Feature Selection Effect on Missing Value Imputation of Medical Datasets
    Liu, Chia-Hui
    Tsai, Chih-Fong
    Sue, Kuen-Liang
    Huang, Min-Wei
    APPLIED SCIENCES-BASEL, 2020, 10 (07):