On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification

被引:0
|
作者
Yang, Yingzhen [1 ]
Liang, Feng [1 ]
Yan, Shuicheng [2 ]
Wang, Zhangyang [1 ]
Huang, Thomas S. [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Natl Univ Singapore, Singapore 117576, Singapore
基金
美国国家科学基金会;
关键词
RATES; CONSISTENCY; UNIFORM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plug-in classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Learning pairwise similarity for data clustering
    Fred, Ana L. N.
    Jain, Anil K.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 925 - +
  • [2] gCLUPS: Graph Clustering Based on Pairwise Similarity
    Yulita, Intan Nurma
    Wasito, Ito
    Mujiono
    2013 INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2013, : 77 - 81
  • [3] Scalable Bayesian Nonparametric Clustering and Classification
    Ni, Yang
    Muller, Peter
    Diesendruck, Maurice
    Williamson, Sinead
    Zhu, Yitan
    Ji, Yuan
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (01) : 53 - 65
  • [4] A set theory based similarity measure for text clustering and classification
    Amer, Ali A.
    Abdalla, Hassan I.
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [5] A set theory based similarity measure for text clustering and classification
    Ali A. Amer
    Hassan I. Abdalla
    Journal of Big Data, 7
  • [6] A Similarity Measure for Text Classification and Clustering
    Lin, Yung-Shen
    Jiang, Jung-Yi
    Lee, Shie-Jue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
  • [7] Hierarchical Clustering Without Pairwise Distances by Incremental Similarity Search
    Schubert, Erich
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2024, 2025, 15268 : 238 - 252
  • [8] A theory of similarity functions for learning and clustering
    Blum, Avrim
    DISCOVERY SCIENCE, PROCEEDINGS, 2007, 4755 : 39 - 39
  • [9] A Latent Variable Pairwise Classification Model of a Clustering Ensemble
    Berikov, Vladimir
    MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 279 - 288
  • [10] A Nonparametric Bayesian Dictionary Learning Algorithm with Clustering Structure Similarity
    Dong Daoguang
    Rui Guosheng
    Tian Wenbiao
    Zhang Yang
    Liu Ge
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (11) : 2765 - 2772