Unsupervised data pruning for clustering of noisy data

被引:8
|
作者
Hong, Yi [1 ]
Kwong, Sam [1 ]
Chang, Yuchou [2 ]
Ren, Qingsheng [3 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Brigham Young Univ, Dept Elect & Comp Engn, Provo, UT 84602 USA
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
Clustering analysis; Clustering ensembles; Data pruning;
D O I
10.1016/j.knosys.2008.03.052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data pruning works with identifying noisy instances of a data set and removing them from the data set in order to improve the generalization of a learning algorithm. It has been well studied in supervised classification where the identification and removal of noisy instances are guided by available labels of instances. However, to the best knowledge of the authors', very few work has been done on data pruning for unsupervised clustering. This paper deals with the problem of data pruning for unsupervised clustering under the condition that labels of instances are unknown beforehand. We claim that unsupervised data pruning can benefit for the clustering of the data with noise. We propose a feasible approach, termed as unsupervised Data Pruning using Ensembles of multiple Clusterers (DPEC), to identify noisy instances of a data set. DPEC checks all instances of a data set and identifies noisy instances by using ensembles of multiple clustering results provided by different clusterers on the same data set. We test the performance of DPEC on several real data sets with artificial noise. Experimental results demonstrate that DPEC is often able to improve the accuracy and robustness of the clustering algorithm. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:612 / 616
页数:5
相关论文
共 50 条
  • [41] Unsupervised Evolutionary Clustering Algorithm for Mixed Type Data
    Zheng, Zhi
    Gong, Maoguo
    Ma, Jingjing
    Jiao, Licheng
    Wu, Qiaodi
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [42] Performance determinants of unsupervised clustering methods for microbiome data
    Yushu Shi
    Liangliang Zhang
    Christine B. Peterson
    Kim-Anh Do
    Robert R. Jenq
    Microbiome, 10
  • [43] Unsupervised clustering algorithm for N-dimensional data
    Montgomery, EB
    Huang, H
    Assadi, A
    JOURNAL OF NEUROSCIENCE METHODS, 2005, 144 (01) : 19 - 24
  • [44] Performance determinants of unsupervised clustering methods for microbiome data
    Shi, Yushu
    Zhang, Liangliang
    Peterson, Christine B.
    Do, Kim-Anh
    Jenq, Robert R.
    MICROBIOME, 2022, 10 (01)
  • [45] Gene Expression Data clustering using Unsupervised Methods
    Chandrasekhar, T.
    Thangavel, K.
    Elayaraja, E.
    2011 THIRD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2011, : 146 - 150
  • [46] Unsupervised clustering of wildlife necropsy data for syndromic surveillance
    Warns-Petit, Eva
    Morignat, Eric
    Artois, Marc
    Calavas, Didier
    BMC VETERINARY RESEARCH, 2010, 6
  • [47] A Deep Unsupervised Learning Algorithm for Dynamic Data Clustering
    Pantula, Priyanka D.
    Miriyala, Srinivas S.
    Mitra, Kishalay
    2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC), 2021, : 147 - 152
  • [48] Intelligent Hybrid Algorithm for Unsupervised Data Clustering Problem
    Hamdi, Amira
    Monmarche, Nicolas
    Slimane, Mohamed
    Alimi, Adel M.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 442 - 455
  • [49] TRUNC: A Transfer Learning Unsupervised Network for Data Clustering
    Xavier, Rita
    Peller, John
    de Castro, Leandro Nunes
    IEEE ACCESS, 2025, 13 : 46282 - 46298
  • [50] Network Data Flow Clustering based on Unsupervised Learning
    Lopez-Vizcaino, Manuel
    Dafonte, Carlos
    Novoa, Francisco J.
    Garabato, Daniel
    Alvarez, M. A.
    Fernandez, Diego
    2019 IEEE 18TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2019, : 139 - 143