Missing data imputation with fuzzy feature selection for diabetes dataset

被引:25
|
作者
Dzulkalnine, Mohamad Faiz [1 ]
Sallehuddin, Roselina [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Skudai 81300, Johor, Malaysia
来源
SN APPLIED SCIENCES | 2019年 / 1卷 / 04期
关键词
Missing data; Fuzzy feature selection; Imputation; Classification; SUPPORT VECTOR MACHINES; ALGORITHMS; MODEL;
D O I
10.1007/s42452-019-0383-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Missing data in datasets remain as a difficulty in terms of data analysis in various research fields, especially in the medical field, as it affects the treatment and diagnosis that the patient should receive. In this research, Fuzzy c-means (FCM) are used to impute the missing data. However, like in most data imputation methods, FCM do not consider the presence of irrelevant features. Irrelevant features can increase the computational time of the imputation process and decrease the accuracy of the prediction. Feature selection techniques can alleviate this problem by selecting the most relevant features and reducing the dataset size. Fuzzy principal component analysis (FPCA) is used as the feature selection method in this study as it considers the presence of outliers compared to classical PCA as outliers are the main reason some features renders irrelevant. Therefore, an improved hybrid imputation model of FPCA-Support vector machines-FCM (FPCA-SVM-FCM) has been proposed and employed in this study. The efficiency of the proposed model is investigated on one dataset which is Pima Indians Diabetes dataset. Experimental results showed that the proposed hybrid imputation model is better than the existing methods by producing a more accurate estimation in terms of accuracy, RMSE and MAE. The proposed method was also validated by using Wilcoxon rank sum and Theirs U test and obtained good results compared to SVM-FCM. Therefore, it can be used as an alternative tool for handling missing data in order to obtain a better quality dataset.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] IMPUTATION OF MISSING DATA
    Lunt, M.
    ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [22] Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset
    Psychogyios, Konstantinos
    Ilias, Loukas
    Askounis, Dimitris
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [23] A Prediction Model with Multi-Pattern Missing Data Imputation for Medical Dataset
    Jegadeeswari, K.
    Ragunath, R.
    Rathipriya, R.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 538 - 553
  • [24] Effectiveness of Simple Data Imputation for Missing Feature Values in Binary Classification
    Chatterjee, A.
    Woodruff, H.
    Lobbes, M.
    van Wijk, Y.
    Beuque, M.
    Seuntjens, J.
    Lambin, P.
    MEDICAL PHYSICS, 2020, 47 (06) : E609 - E609
  • [25] OPTIMAL BAYESIAN FEATURE SELECTION WITH MISSING DATA
    Pour, Ali Foroughi
    Dalton, Lori A.
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 35 - 39
  • [26] A new iterative fuzzy clustering algorithm for multiple imputation of missing data
    Nikfalazar, Sanaz
    Yeh, Chung-Hsing
    Bedingfield, Susan
    Khorshidi, Hadi A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [27] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Halder, Bohnishikha
    Ahmed, Md Manjur
    Amagasa, Toshiyuki
    Isa, Nor Ashidi Mat
    Faisal, Rahat Hossain
    Rahman, Md Mostafijur
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5561 - 5583
  • [28] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Bohnishikha Halder
    Md Manjur Ahmed
    Toshiyuki Amagasa
    Nor Ashidi Mat Isa
    Rahat Hossain Faisal
    Md. Mostafijur Rahman
    Applied Intelligence, 2022, 52 : 5561 - 5583
  • [29] Incremental Missing-Data Imputation for Evolving Fuzzy Granular Prediction
    Garcia, Cristiano
    Leite, Daniel
    Skrjanc, Igor
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (10) : 2348 - 2362
  • [30] Missing data imputation for fuzzy rule-based classification systems
    Julián Luengo
    José A. Sáez
    Francisco Herrera
    Soft Computing, 2012, 16 : 863 - 881