Techniques to Deal with Missing Data

被引:0
|
作者
Sessa, Jadran [1 ]
Syed, Dabeeruddin [1 ]
机构
[1] Masdar Inst Sci & Technol, Dept Elect Engn & Comp Sci, Abu Dhabi, U Arab Emirates
关键词
Data mining; Missing data; Missing values; Probabilistic approach; k-NN imputation; Mean and Median imputation; IMPUTATION; VALUES; DATABASES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data is available to us in humongous amounts in the real world, but none of it is of practical use if not converted to useful information. However, the knowledge discovery is hindered because the real data is often incomplete and noisy. Nowadays, the problem of recovering missing data has found most important place in the field of data mining. Filling the missing data is a significant task, as it is paramount to use all available data for the given datasets are generally very small. In this paper, we deal with the real data with many missing values. Furthermore, we deal with the given data in three phases. The first phase considers the concept of feature selection, while the second phase iteratively considers filling in the missing values using probabilistic approach, keeping in mind the fact that features can be either nominal or numerical. Finally, the third phase deals with correcting the missing values that have been filled in. In our work, we have compared two imputation methods for dealing with the missing data, namely k-NN imputation method and mean and median imputation method. As a result, we have found that both of the imputation methods are efficient and yield more or less the same accuracy.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] How to Deal With Missing Data?
    Rabung, Sven
    PSYCHOTHERAPIE PSYCHOSOMATIK MEDIZINISCHE PSYCHOLOGIE, 2010, 60 (12) : 485 - 485
  • [2] How to Deal with Missing Data?
    Vibha, Deepti
    Prasad, Kameshwar
    NEUROLOGY INDIA, 2020, 68 (04) : 886 - 888
  • [3] Attrition in longitudinal studies: How to deal with missing data
    Twisk, J
    de Vente, W
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2002, 55 (04) : 329 - 337
  • [4] What's the big deal about missing data?
    Cole, RE
    Feinstein, NF
    Bender, NL
    APPLIED NURSING RESEARCH, 2001, 14 (04) : 225 - 226
  • [5] Statistical primer: how to deal with missing data in scientific research?
    Papageorgiou, Grigorios
    Grant, Stuart W.
    Takkenberg, Johanna J. M.
    Mokhles, Mostafa M.
    INTERACTIVE CARDIOVASCULAR AND THORACIC SURGERY, 2018, 27 (02) : 153 - 158
  • [6] How can I deal with missing data in my study?
    Bennett, DA
    AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH, 2001, 25 (05) : 464 - 469
  • [7] Missing Data Techniques for Factor Analysis
    Wang, Hong-Long
    Yang, Meng-Li
    Chen, Chun-Ju
    Lin, Ting-Hsiang
    JOURNAL OF RESEARCH IN EDUCATION SCIENCES, 2012, 57 (01): : 29 - 50
  • [8] THE MYSTERY OF THE MISSING DEAL
    GRIDGEMAN, NT
    AMERICAN STATISTICIAN, 1964, 18 (01): : 15 - 16
  • [9] Testing Measurement Invariance with Ordinal Missing Data: A Comparison of Estimators and Missing Data Techniques
    Chen, Po-Yi
    Wu, Wei
    Garnier-Villarreal, Mauricio
    Kite, Benjamin Arthur
    Jia, Fan
    MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 55 (01) : 87 - 101
  • [10] Missing Data in Multiple Item Scales: A Monte Carlo Analysis of Missing Data Techniques
    Roth, Philip L.
    Switzer, Fred S., III
    Switzer, Deborah M.
    ORGANIZATIONAL RESEARCH METHODS, 1999, 2 (03) : 211 - 232