On density based transforms for uncertain data mining

被引:0
|
作者
Aggarwal, Charu C. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In spite of the great progress in the data mining field in recent years, the problem of missing and uncertain data has remained a great challenge for data mining algorithms. Many real data sets have missing attribute values or values which are only approximately measured or imputed. In some methodologies such as privacy preserving data mining, it is desirable to explicitly add perturbations to the data in order to mask sensitive values. If the underlying data is not of high quality, one cannot expect the corresponding algorithms to perform effectively. In many cases, it may be possible to obtain quantitative measures of the errors in different entries of the data. In this paper we will show that this is very useful information for the data mining process, since it can be leveraged to improve the quality of the results. We discuss a new method for handling error-prone and missing data with the use of density based approaches to data mining. We discuss methods for constructing error-adjusted densities of data sets, and using these densities as intermediate representations in order to perform more accurate mining. We discuss the mathematical foundations behind the method and establish ways of extending it to very large scale data mining problems. As a concrete example of our technique, we show how to apply the intermediate density representation in order to accurately solve the classification problem. We show that the error-based method can be effectively and efficiently applied to very large data sets, and turns out to be very useful as a general approach to such problems.
引用
收藏
页码:841 / 850
页数:10
相关论文
共 50 条
  • [31] Frequent pattern mining algorithm for uncertain data streams based on sliding window
    Yang, Junrui
    Yang, Cai
    Wei, Yanjun
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2, 2016, : 265 - 268
  • [32] A density grid-based uncertain data stream clustering algorithm
    Zhao, J. (jintianzhao@yahoo.com), 1600, Binary Information Press (10):
  • [33] A Density Grid-based Clustering Algorithm for Uncertain Data Streams
    Tu, Li
    Cui, Peng
    Tang, Keming
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 347 - +
  • [34] Peak clustering algorithm for uncertain data density based on JS divergence
    Li, Song
    Liu, Xiao-Nan
    Liu, Juan
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (07): : 2038 - 2048
  • [35] An Efficient Density-based Approach for Data Mining Tasks
    Domeniconi, Carlotta
    Gunopulos, Dimitrios
    KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (06) : 750 - 770
  • [36] An Efficient Density-based Approach for Data Mining Tasks
    Carlotta Domeniconi
    Dimitrios Gunopulos
    Knowledge and Information Systems, 2004, 6 : 750 - 770
  • [37] Efficient Density Based Outlier Handling Technique in Data Mining
    Sharma, Krishna Gopal
    Ram, Anant
    Singh, Yashpal
    ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, PT I, 2011, 131 : 542 - +
  • [38] Fuzzy Association Rule Mining based Frequent Pattern Extraction from Uncertain Data
    Rajput, D. S.
    Thakur, R. S.
    Thakur, G. S.
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 709 - 714
  • [39] Mining fuzzy association rules from uncertain data
    Cheng-Hsiung Weng
    Yen-Liang Chen
    Knowledge and Information Systems, 2010, 23 : 129 - 152
  • [40] Mining significant association rules from uncertain data
    Zhang, Anshu
    Shi, Wenzhong
    Webb, Geoffrey I.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) : 928 - 963