CLASSIFIERS ACCURACY IMPROVEMENT BASED ON MISSING DATA IMPUTATION

被引:28
|
作者
Jordanov, Ivan [1 ]
Petrov, Nedyalko [1 ]
Petrozziello, Alessio [1 ]
机构
[1] Univ Portsmouth, Sch Comp, Portsmouth PO1 3FE, Hants, England
关键词
machine learning; missing data; model-based imputation; neural networks; random forests; support vector machines; radar signal classification; NEURAL-NETWORK; CLASSIFICATION; RECOGNITION;
D O I
10.1515/jaiscr-2018-0002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon's test for statistical significance and Cohen's effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers' performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses ('military' and 'civil'), each containing several 'subclasses', and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [1] Accuracy improvement in air-quality forecasting using regressor combination with missing data imputation
    Ozturk, Ali
    [J]. COMPUTATIONAL INTELLIGENCE, 2021, 37 (01) : 226 - 252
  • [2] Missing Data Imputation and Its Effect on the Accuracy of Classification
    Hunt, Lynette A.
    [J]. DATA SCIENCE: INNOVATIVE DEVELOPMENTS IN DATA ANALYSIS AND CLUSTERING, 2017, : 3 - 14
  • [3] Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study
    Gabr, Menna Ibrahim
    Helmy, Yehia Mostafa
    Elzanfaly, Doaa Saad
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (01)
  • [4] IMPUTATION OF MISSING DATA
    Lunt, M.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2014, 73 : 49 - 49
  • [5] An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers
    Garciarena, Unai
    Santana, Roberto
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 89 : 52 - 65
  • [6] Improving Accuracy Rate of Imputation of Missing Data using Classifier Methods
    Thirukumaran, S.
    Sumathi, A.
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [7] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [8] Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data
    Saini, Preeti
    Nagpal, Bharti
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40393 - 40414
  • [9] Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data
    Preeti Saini
    Bharti Nagpal
    [J]. Multimedia Tools and Applications, 2024, 83 : 40393 - 40414
  • [10] MIAEC: Missing Data Imputation Based on the Evidence Chain
    Xu, Xiaolong
    Chong, Weizhi
    Li, Shancang
    Arabo, Abdullahi
    Xiao, Jianyu
    [J]. IEEE ACCESS, 2018, 6 : 12983 - 12992