Unsupervised data pruning for clustering of noisy data

被引:8
|
作者
Hong, Yi [1 ]
Kwong, Sam [1 ]
Chang, Yuchou [2 ]
Ren, Qingsheng [3 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Brigham Young Univ, Dept Elect & Comp Engn, Provo, UT 84602 USA
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
Clustering analysis; Clustering ensembles; Data pruning;
D O I
10.1016/j.knosys.2008.03.052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data pruning works with identifying noisy instances of a data set and removing them from the data set in order to improve the generalization of a learning algorithm. It has been well studied in supervised classification where the identification and removal of noisy instances are guided by available labels of instances. However, to the best knowledge of the authors', very few work has been done on data pruning for unsupervised clustering. This paper deals with the problem of data pruning for unsupervised clustering under the condition that labels of instances are unknown beforehand. We claim that unsupervised data pruning can benefit for the clustering of the data with noise. We propose a feasible approach, termed as unsupervised Data Pruning using Ensembles of multiple Clusterers (DPEC), to identify noisy instances of a data set. DPEC checks all instances of a data set and identifies noisy instances by using ensembles of multiple clustering results provided by different clusterers on the same data set. We test the performance of DPEC on several real data sets with artificial noise. Experimental results demonstrate that DPEC is often able to improve the accuracy and robustness of the clustering algorithm. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:612 / 616
页数:5
相关论文
共 50 条
  • [21] Combining supervised and unsupervised learning for data clustering
    Corsini, Paolo
    Lazzerini, Beatrice
    Marcelloni, Francesco
    NEURAL COMPUTING & APPLICATIONS, 2006, 15 (3-4): : 289 - 297
  • [22] Unsupervised Clustering of People from 'Skeleton' Data
    Ball, Adrian
    Rye, David
    Ramos, Fabio
    Velonaki, Mari
    HRI'12: PROCEEDINGS OF THE SEVENTH ANNUAL ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2012, : 225 - 226
  • [23] Unsupervised training of Bayesian networks for data clustering
    Pham, Duc Truong
    Ruz, Gonzalo A.
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 465 (2109): : 2927 - 2948
  • [24] An unsupervised clustering algorithm for data on the unit hypersphere
    Yang, Min-Shen
    Chang-Chien, Shou-Jen
    Hung, Wen-Liang
    APPLIED SOFT COMPUTING, 2016, 42 : 290 - 313
  • [25] On unsupervised simultaneous kernel learning and data clustering
    Malhotra, Akshay
    Schizas, Ioannis D.
    PATTERN RECOGNITION, 2020, 108
  • [26] Transfer Heterogeneous Unlabeled Data for Unsupervised Clustering
    Kong, Shu
    Wang, Donghui
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1193 - 1196
  • [27] Deer hunting optimization technique for clustering unsupervised data in data mining
    Azeez, Hayder Hussein
    INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (01)
  • [28] Unsupervised clustering methods for medical data: An application to thyroid gland data
    Albayrak, S
    ARTIFICAIL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 695 - 701
  • [29] Data-driven unsupervised EEG clustering on tantric meditation data
    Mikhaylets, E.
    Razorenova, A.
    Chernyshev, V.
    Boytsova, J.
    Syrov, N.
    Yakovlev, L.
    Kokurina, E.
    Zhironkina, Y.
    Kaplan, A.
    Medvedev, S.
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2023, 188 : 78 - 78
  • [30] Unsupervised Competitive Learning Clustering and Visual Method to Obtain Accurate Trajectories From Noisy Repetitive GPS Data
    Mariotto, Flavio Tonioli
    Yoma, Nestor Becerra
    de Almeida, Madson Cortes
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 1562 - 1572