Unsupervised data pruning for clustering of noisy data

被引:8
|
作者
Hong, Yi [1 ]
Kwong, Sam [1 ]
Chang, Yuchou [2 ]
Ren, Qingsheng [3 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Brigham Young Univ, Dept Elect & Comp Engn, Provo, UT 84602 USA
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
Clustering analysis; Clustering ensembles; Data pruning;
D O I
10.1016/j.knosys.2008.03.052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data pruning works with identifying noisy instances of a data set and removing them from the data set in order to improve the generalization of a learning algorithm. It has been well studied in supervised classification where the identification and removal of noisy instances are guided by available labels of instances. However, to the best knowledge of the authors', very few work has been done on data pruning for unsupervised clustering. This paper deals with the problem of data pruning for unsupervised clustering under the condition that labels of instances are unknown beforehand. We claim that unsupervised data pruning can benefit for the clustering of the data with noise. We propose a feasible approach, termed as unsupervised Data Pruning using Ensembles of multiple Clusterers (DPEC), to identify noisy instances of a data set. DPEC checks all instances of a data set and identifies noisy instances by using ensembles of multiple clustering results provided by different clusterers on the same data set. We test the performance of DPEC on several real data sets with artificial noise. Experimental results demonstrate that DPEC is often able to improve the accuracy and robustness of the clustering algorithm. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:612 / 616
页数:5
相关论文
共 50 条
  • [1] Clustering noisy data by a principal feature extraction unsupervised neural network
    Vacca, F
    Chiarantoni, E
    IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2361 - 2366
  • [2] Efficient unsupervised mining from noisy data sets: application to clustering co-occurrence data
    Mamitsuka, H
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 239 - 243
  • [3] Unsupervised Feature Selection for Noisy Data
    Mahdavi, Kaveh
    Labarta, Jesus
    Gimenez, Judit
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2019, 2019, 11888 : 79 - 94
  • [4] Clustering Noisy Temporal Data
    Grant, Paul
    Islam, Md Zahidul
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2019, 2019, 11888 : 184 - 193
  • [5] Unsupervised clustering in streaming data
    Tasoulis, Dimitris K.
    Adams, Niall M.
    Hand, David J.
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 638 - +
  • [6] Unsupervised record matching with noisy and incomplete data
    van Gennip Y.
    Hunter B.
    Ma A.
    Moyer D.
    de Vera R.
    Bertozzi A.L.
    van Gennip, Yves (y.vangennip@nottingham.ac.uk), 2018, Springer Science and Business Media Deutschland GmbH (06) : 109 - 129
  • [7] SoftPatch: Unsupervised Anomaly Detection with Noisy Data
    Jiang, Xi
    Liu, Jianlin
    Wang, Jinbao
    Nie, Qian
    Wu, Kai
    Liu, Yong
    Wang, Chengjie
    Zheng, Feng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Unsupervised Inverse Reinforcement Learning with Noisy Data
    Surana, Amit
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 4938 - 4945
  • [9] Optimized Ensembles for Clustering Noisy Data
    Breaban, Mihaela Elena
    LEARNING AND INTELLIGENT OPTIMIZATION, 2010, 6073 : 220 - 223
  • [10] Competitive algorithms for the clustering of noisy data
    Yang, TN
    Wang, SD
    FUZZY SETS AND SYSTEMS, 2004, 141 (02) : 281 - 299