Use of Data Mining for Intelligent Evaluation of Imputation Methods

被引:0
|
作者
Martinez, David L. la Red [1 ]
Primorac, Carlos R. [2 ]
机构
[1] Natl Technol Univ, Resistencia Reg Fac, Resistencia, Argentina
[2] Natl Univ Northeast, Comp Sci Dept, Corrientes, Argentina
关键词
Computer Science; Data Imputation; Data Mining; Interdisciplinary Applications; Performance Evaluation of Imputation Methods;
D O I
10.9781/ijimai.2023.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world situations, researchers frequently face the difficulty of missing values (MV), i.e., values not observed in a data set. Data imputation techniques allow the estimation of MV using different algorithms, by means of which important data can be imputed for a particular instance. Most of the literature in this field deals with different imputation methods. However, few studies deal with a comparative evaluation of the different methods as to provide more appropriate guidelines for the selection of the method to be applied to impute data for specific situations. The objective of this work is to show a methodology for evaluating the performance of imputation methods by means of new metrics derived from data mining processes, using quality metrics of data mining models. We started from the complete dataset that was amputated with different amputation mechanisms to generate 63 datasets with MV; these were imputed using Median, k-NN, k-Means and Hot-Deck imputation methods. The performance of the imputation methods was evaluated using new metrics derived from quality metrics of the data mining processes, performed with the original full file and with the imputed files. This evaluation is not based on measuring the error when imputing (usual operation), but on considering the similarity of the values of the quality metrics of the data mining processes obtained with the original file and with the imputed files. The results show that -globally considered and according to the new proposed metric, the imputation methods that showed the best performance were k-NN and k-Means. An additional advantage of the proposed methodology is that it provides predictive data mining models that can be used a posteriori.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A matter of perspective: Imputation methods and their effects illustrated with the use of data from the PRIME study
    Reich, Kristian
    Bachhuber, Teresa
    Melzer, Nima
    Sieder, Christian
    Sticherling, Michael
    [J]. JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY, 2018, 79 (03) : AB13 - AB13
  • [32] An overview and evaluation of recent machine learning imputation methods using cardiac imaging data
    Liu Y.
    Gopalakrishnan V.
    [J]. Liu, Yuzhe (y.liu@pitt.edu), 1600, MDPI (02):
  • [33] IMPROVING COSTING METHODS IN MULTICENTRE ECONOMIC EVALUATION: THE USE OF MULTIPLE IMPUTATION FOR UNIT COSTS
    Grieve, Richard
    Cairns, John
    Thompson, Simon G.
    [J]. HEALTH ECONOMICS, 2010, 19 (08) : 939 - 954
  • [34] Deep Learning Methods for Omics Data Imputation
    Huang, Lei
    Song, Meng
    Shen, Hui
    Hong, Huixiao
    Gong, Ping
    Deng, Hong-Wen
    Zhang, Chaoyang
    [J]. BIOLOGY-BASEL, 2023, 12 (10):
  • [35] Data Mining and Analysis of NLP Methods in Students Evaluation of Teaching
    Acosta-Ugalde, Diego
    Conant-Pablos, Santiago Enrique
    Camacho-Zuniga, Claudia
    Gutierrez-Rodriguez, Andres Eduardo
    [J]. ADVANCES IN SOFT COMPUTING, MICAI 2023, PT II, 2024, 14392 : 28 - 38
  • [36] Performance Evaluation of Methods for Mining Frequent Itemsets on Temporal Data
    Tripathi, Tripti
    Yadav, Divakar
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES, ICCNCT 2019, 2020, 44 : 910 - 917
  • [37] Evaluation of complex petroleum reservoirs based on data mining methods
    Tan, Fengqi
    Luo, Gang
    Wang, Duojun
    Chen, Yangkang
    [J]. COMPUTATIONAL GEOSCIENCES, 2017, 21 (01) : 151 - 165
  • [38] Missing data and imputation methods in partition of variables
    da Silva, AL
    Saporta, G
    Bacelar-Nicolau, H
    [J]. CLASSIFICATION, CLUSTERING, AND DATA MINING APPLICATIONS, 2004, : 631 - 637
  • [39] Performance Evaluation of Imputation Methods for Missing Data in Logistic Regression Model: Simulation and Application
    Mohamed, Salah M.
    Abonazel, Mohamed R.
    Ghallab, Mohamed G.
    [J]. THAILAND STATISTICIAN, 2023, 21 (04): : 926 - 942
  • [40] Comparison of alternative imputation methods for ordinal data
    Cugnata, Federica
    Salini, Silvia
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (01) : 315 - 330