Use of Data Mining for Intelligent Evaluation of Imputation Methods

被引:0
|
作者
Martinez, David L. la Red [1 ]
Primorac, Carlos R. [2 ]
机构
[1] Natl Technol Univ, Resistencia Reg Fac, Resistencia, Argentina
[2] Natl Univ Northeast, Comp Sci Dept, Corrientes, Argentina
关键词
Computer Science; Data Imputation; Data Mining; Interdisciplinary Applications; Performance Evaluation of Imputation Methods;
D O I
10.9781/ijimai.2023.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of the literature in this field deals with different imputation methods. However, few studies deal with a comparative evaluation of the different methods as to provide more appropriate guidelines for the selection of the method to be applied to impute data for specific situations. The objective of this work is to show a methodology for evaluating the performance of imputation methods by means of new metrics derived from data mining processes, using quality metrics of data mining models. We started from the complete dataset that was amputated with different amputation mechanisms to generate 63 datasets with MV; these were imputed using Median, k-NN, k-Means and Hot-Deck imputation methods. The performance of the imputation methods was evaluated using new metrics derived from quality metrics of the data mining processes, performed with the original full file and with the imputed files. This evaluation is not based on measuring the error when imputing (usual operation), but on considering the similarity of the values of the quality metrics of the data mining processes obtained with the original file and with the imputed files. The results show that -globally considered and according to the new proposed metric, the imputation methods that showed the best performance were k-NN and k-Means. An additional advantage of the proposed methodology is that it provides predictive data mining models that can be used a posteriori.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Imputation methods for missing data in educational diagnostic evaluation
    Fernandez-Alonso, Ruben
    Suarez-Alvarez, Javier
    Muniz, Jose
    [J]. PSICOTHEMA, 2012, 24 (01) : 167 - 175
  • [2] Imputation methods to deal with missing values when data mining trauma injury data
    Penny, Kay I.
    Chesney, Thomas
    [J]. ITI 2006: PROCEEDINGS OF THE 28TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2006, : 213 - +
  • [3] Evaluation of missing data imputation methods for human osteometric measurements
    Liu, Xiaoming
    Pang, Jinyong
    [J]. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY, 2024, 183 : 103 - 104
  • [4] An evaluation of methods for imputation of missing trace element data in groundwaters
    Dickson, Bruce L.
    Giblin, Angela M.
    [J]. GEOCHEMISTRY-EXPLORATION ENVIRONMENT ANALYSIS, 2007, 7 : 173 - 178
  • [5] The use of genomic data and imputation methods in dairy cattle breeding
    Klimova, Anita
    Kasna, Eva
    Machova, Karolina
    Brzakova, Michaela
    Pribyl, Josef
    Vostry, Lubos
    [J]. CZECH JOURNAL OF ANIMAL SCIENCE, 2020, 65 (12) : 445 - 453
  • [6] Evaluation of machine learning methods for covariate data imputation in pharmacometrics
    Braem, Dominic Stefan
    Nahum, Uri
    Atkinson, Andrew
    Koch, Gilbert
    Pfister, Marc
    [J]. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY, 2022, 11 (12): : 1638 - 1648
  • [7] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    [J]. ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [8] A Benchmark for Data Imputation Methods
    Jaeger, Sebastian
    Allhorn, Arndt
    Biessmann, Felix
    [J]. FRONTIERS IN BIG DATA, 2021, 4
  • [9] Imputation Methods for Incomplete Data
    Umathe, Vaishali H.
    Chaudhary, Gauri
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [10] Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods
    Myrtveit, I
    Stensrud, E
    Olsson, UH
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2001, 27 (11) : 999 - 1013