On density based transforms for uncertain data mining

被引:0
|
作者
Aggarwal, Charu C. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In spite of the great progress in the data mining field in recent years, the problem of missing and uncertain data has remained a great challenge for data mining algorithms. Many real data sets have missing attribute values or values which are only approximately measured or imputed. In some methodologies such as privacy preserving data mining, it is desirable to explicitly add perturbations to the data in order to mask sensitive values. If the underlying data is not of high quality, one cannot expect the corresponding algorithms to perform effectively. In many cases, it may be possible to obtain quantitative measures of the errors in different entries of the data. In this paper we will show that this is very useful information for the data mining process, since it can be leveraged to improve the quality of the results. We discuss a new method for handling error-prone and missing data with the use of density based approaches to data mining. We discuss methods for constructing error-adjusted densities of data sets, and using these densities as intermediate representations in order to perform more accurate mining. We discuss the mathematical foundations behind the method and establish ways of extending it to very large scale data mining problems. As a concrete example of our technique, we show how to apply the intermediate density representation in order to accurately solve the classification problem. We show that the error-based method can be effectively and efficiently applied to very large data sets, and turns out to be very useful as a general approach to such problems.
引用
收藏
页码:841 / 850
页数:10
相关论文
共 50 条
  • [1] Mining uncertain data
    Leung, Carson Kai-Sang
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (04) : 316 - 329
  • [2] DataRinse: Semantic Transforms for Data preparation based on Code Mining
    Abdelaziz, Ibrahim
    Dolby, Julian
    Khurana, Udayan
    Samulowitz, Horst
    Srinivas, Kavitha
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 4090 - 4093
  • [3] Data Mining for Uncertain Data Based on Difference Degree of Concept Lattice
    Wang, Qian
    Dong, Shi
    Naeem, Hamad
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2024, 20 (03): : 317 - 327
  • [4] Mining Uncertain Event Data in Process Mining
    Pegoraro, Marco
    van der Aalst, Wil M. P.
    2019 INTERNATIONAL CONFERENCE ON PROCESS MINING (ICPM 2019), 2019, : 89 - 96
  • [5] Granular Sketch Based Uncertain Data Streams Pattern Mining
    Chen, Jingyu
    Chen, Ping
    Sheng, Xian'gang
    INFORMATION COMPUTING AND APPLICATIONS, ICICA 2013, PT I, 2013, 391 : 488 - +
  • [6] Hierarchical density-based clustering of uncertain data
    Kriegel, HP
    Pfeifle, M
    Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 689 - 692
  • [7] Stream mining on univariate uncertain data
    Liu, Ying-Ho
    APPLIED INTELLIGENCE, 2013, 39 (02) : 315 - 344
  • [8] Updating Mining Resources with Uncertain Data
    Neves, Joao
    Pereira, Maria Joao
    Pacheco, Nelson
    Soares, Amilcar
    MATHEMATICAL GEOSCIENCES, 2019, 51 (07) : 905 - 924
  • [9] Stream mining on univariate uncertain data
    Ying-Ho Liu
    Applied Intelligence, 2013, 39 : 315 - 344
  • [10] Updating Mining Resources with Uncertain Data
    João Neves
    Maria João Pereira
    Nelson Pacheco
    Amilcar Soares
    Mathematical Geosciences, 2019, 51 : 905 - 924