Input validation for semi-supervised clustering

被引:0
|
作者
Yip, Kevin Y.
Ng, Michael K.
Cheung, David W.
机构
[1] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[2] Hong Kong Baptist Univ, Dept Math, Hong Kong, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semi-supervised clustering is practical in situations in which there exists some domain knowledge that could help the clustering process, but which is not suitable or not sufficient for supervised learning. There have been a number of studies on semi-supervised clustering, but almost all of them assume the input knowledge is correct or largely correct. In this paper we show that even a small proportion of incorrect input knowledge could make a semi-supervised clustering algorithm perform worse than having no inputs. This is a real concern since in real applications it is reasonable to have problematic "knowledge inputs" that are wrong or inappropriate for the clustering task. We propose a general methodology for detecting potentially incorrect inputs and performing verifications. Based on the methodology, we outline some methods for validating the inputs of the semi-supervised clustering algorithm MPCK-Means. Experimental results show that the input validation step is both critical and effective as the clustering accuracy of MPCK-Means was lowered by incorrect inputs, but the lost accuracy was resumed when validation was performed.
引用
收藏
页码:479 / 483
页数:5
相关论文
共 50 条
  • [1] Semi-supervised clustering methods
    Bair, Eric
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05): : 349 - 361
  • [2] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [3] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    [J]. INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [4] Semi-Supervised Clustering for Architectural Modularisation
    Feist, Sofia
    Sanhudo, Luis
    Esteves, Vitor
    Pires, Miguel
    Costa, Antonio Aguiar
    [J]. BUILDINGS, 2022, 12 (03)
  • [5] Semi-supervised clustering with soft labels
    Nebu, Cynthia Marea
    Joseph, Sumy
    [J]. 2015 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION & COMPUTING INDIA (ICCC), 2015, : 612 - 616
  • [6] Research Progress on Semi-Supervised Clustering
    Yue Qin
    Shifei Ding
    Lijuan Wang
    Yanru Wang
    [J]. Cognitive Computation, 2019, 11 : 599 - 612
  • [7] Spectral clustering: A semi-supervised approach
    Chen, Weifu
    Feng, Guocan
    [J]. NEUROCOMPUTING, 2012, 77 (01) : 229 - 242
  • [8] Semi-supervised spectral clustering ensemble
    [J]. 1600, ICIC Express Letters Office (10):
  • [9] Image Annotation with Semi-Supervised Clustering
    Sayar, Ahmet
    Yannan-Vural, Fatos T.
    [J]. 2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 517 - 520
  • [10] Semi-supervised clustering of unknown expressions
    Jalal, Ahsan
    Tariq, Usman
    [J]. PATTERN RECOGNITION LETTERS, 2019, 120 : 46 - 53