Data mining and the impact of missing data

被引:95
|
作者
Brown, ML [1 ]
Kros, JF
机构
[1] Hawaii Pacific Univ, Sch Business, Honolulu, HI USA
[2] E Carolina Univ, Dept Decis Sci, Greenville, NC USA
关键词
data handling; database management systems; information gathering; information retrieval;
D O I
10.1108/02635570310497657
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, the significance of the analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for model training and testing. Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. The issue of missing data must be addressed since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the impact of missing data on the data mining process.
引用
收藏
页码:611 / 621
页数:11
相关论文
共 50 条
  • [31] A Valued Tolerance Approach to Missing Attribute Values in Data Mining
    Grzymala-Busse, Jerzy W.
    Hippe, Zdzislaw S.
    Rzasa, Wojciech
    Vasudevan, Supriya
    HSI: 2009 2ND CONFERENCE ON HUMAN SYSTEM INTERACTIONS, 2009, : 217 - 224
  • [32] Applying data mining algorithms to inpatient dataset with missing values
    Liu, Peng
    El-Darzi, Elia
    Lei, Lei
    Vasilakis, Christos
    Chountas, Panagiotis
    Huang, Wei
    JOURNAL OF ENTERPRISE INFORMATION MANAGEMENT, 2007, 21 (01) : 81 - +
  • [33] Missing Value Treatment of the Data Mining Based on Bayesian Principle
    Qiu Zhao
    Meng Mingrui
    Huang Jun
    ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 81 - 84
  • [34] Mining missing train logs from Smart Card data
    Min, Yun-Hong
    Ko, Suk-Joon
    Kim, Kyung Min
    Hong, Sung-Pil
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2016, 63 : 170 - 181
  • [35] Decision-rule solutions for data mining with missing values
    Weiss, SM
    Indurkhya, N
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2000, 1952 : 1 - 10
  • [36] Mining missing train logs from Smart Card data
    Min, Yun-Hong
    Ko, Suk-Joon
    Kim, Kyung Min
    Hong, Sung-Pil
    Transportation Research Part C: Emerging Technologies, 2016, 63 : 170 - 181
  • [37] Handling of missing data to improve the mining of large feed databases
    Maroto-Molina, F.
    Gomez-Cabrera, A.
    Guerrero-Ginel, J. E.
    Garrido-Varo, A.
    Sauvant, D.
    Tran, G.
    Heuze, V.
    Perez-Marin, D. C.
    JOURNAL OF ANIMAL SCIENCE, 2013, 91 (01) : 491 - 500
  • [38] The Impact of Electronic Data Capture System Data Entry Time Delay on Missing Data Points
    Sinani, Ervin
    MUSCLE & NERVE, 2022, 66 : S39 - S39
  • [39] Impact of Missing Data on Phylogenies Inferred from Empirical Phylogenomic Data Sets
    Roure, Beatrice
    Baurain, Denis
    Philippe, Herve
    MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (01) : 197 - 214
  • [40] Impact of missing data in evaluating artificial neural networks trained on complete data
    Markey, MK
    Tourassi, GD
    Margolis, M
    DeLong, DM
    COMPUTERS IN BIOLOGY AND MEDICINE, 2006, 36 (05) : 516 - 525