Combining Semi-supervised Clustering and Classification Under a Generalized Framework

被引:0
|
作者
Jiang, Zhen [1 ,2 ]
Zhao, Lingyun [1 ]
Lu, Yu [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Peoples R China
[2] Jiangsu Prov Big Data Ubiquitous Percept & Intelli, Zhenjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Co-training; Classification; Semi-supervised clustering; Cluster-splitting;
D O I
10.1007/s00357-024-09489-9
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Most machine learning algorithms rely on having a sufficient amount of labeled data to train a reliable classifier. However, labeling data is often costly and time-consuming, while unlabeled data can be readily accessible. Therefore, learning from both labeled and unlabeled data has become a hot topic of interest. Inspired by the co-training algorithm, we present a learning framework called CSCC, which combines semi-supervised clustering and classification to learn from both labeled and unlabeled data. Unlike existing co-training style methods that construct diverse classifiers to learn from each other, CSCC leverages the diversity between semi-supervised clustering and classification models to achieve mutual enhancement. Existing classification algorithms can be easily adapted to CSCC, allowing them to generalize from a few labeled data. Especially, in order to bridge the gap between class information and clustering, we propose a semi-supervised hierarchical clustering algorithm that utilizes labeled data to guide the process of cluster-splitting. Within the CSCC framework, we introduce two loss functions to supervise the iterative updating of the semi-supervised clustering and classification models, respectively. Extensive experiments conducted on a variety of benchmark datasets validate the superiority of CSCC over other state-of-the-art methods.
引用
收藏
页码:181 / 204
页数:24
相关论文
共 50 条
  • [1] Semi-supervised generalized eigenvalues classification
    Viola, Marco
    Sangiovanni, Mara
    Toraldo, Gerardo
    Guarracino, Mario R.
    ANNALS OF OPERATIONS RESEARCH, 2019, 276 (1-2) : 249 - 266
  • [2] Semi-supervised generalized eigenvalues classification
    Marco Viola
    Mara Sangiovanni
    Gerardo Toraldo
    Mario R. Guarracino
    Annals of Operations Research, 2019, 276 : 249 - 266
  • [3] A Unified Framework of Density-Based Clustering for Semi-Supervised Classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [4] Combining smooth graphs with semi-supervised classification
    Zhou, Xueyuan
    Li, Chunping
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 400 - 409
  • [5] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [6] Semi-supervised Classification Based on Clustering Ensembles
    Chen, Si
    Guo, Gongde
    Chen, Lifei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PROCEEDINGS, 2009, 5855 : 629 - 638
  • [7] Improving Semi-Supervised Classification using Clustering
    Arora, J.
    Tushir, M.
    Kashyap, R.
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (25) : 1 - 9
  • [8] Text classification with enhanced semi-supervised fuzzy clustering
    Keswani, G
    Hall, LO
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 621 - 626
  • [9] Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification
    Iyigun, Cem
    Ben-Israel, Adi
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 3 - 20
  • [10] Use of Distributed Semi-Supervised Clustering for Text Classification
    Li, Pei
    Deng, Ze
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (08)