Semi-supervised fuzzy co-clustering algorithm for document categorization

被引:0
|
作者
Yang Yan
Lihui Chen
William-Chandra Tjhi
机构
[1] Nanyang Technological University,Division of Information Engineering, School of Electric and Electronic Engineering
[2] A-Star Institute of High Performance Computing,undefined
来源
关键词
Semi-supervised clustering; Fuzzy co-clustering; Must-link/ Cannot-link constraint; Document categorization;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose a new semi-supervised fuzzy co-clustering algorithm called SS-FCC for categorization of large web documents. In this new approach, the clustering process is carried out by incorporating some prior domain knowledge of a dataset in the form of pairwise constraints provided by users into the fuzzy co-clustering framework. With the help of those constraints, the clustering problem is formulated as the problem of maximizing a competitive agglomeration cost function with fuzzy terms, taking into account the provided domain knowledge. The constraint specifies whether a pair of objects “must” or “cannot” be clustered together. The update rules for fuzzy memberships are derived, and an iterative algorithm is designed for the soft co-clustering process. Our experimental studies show that the quality of clustering results can be improved significantly with the proposed approach. Simulations on 10 large benchmark datasets demonstrate the strength and potentials of SS-FCC in terms of performance evaluation criteria, stability and operating time, compared with some of the existing semi-supervised algorithms.
引用
收藏
页码:55 / 74
页数:19
相关论文
共 50 条
  • [1] Semi-supervised fuzzy co-clustering algorithm for document categorization
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (01) : 55 - 74
  • [2] Fuzzy semi-supervised co-clustering for text documents
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    [J]. FUZZY SETS AND SYSTEMS, 2013, 215 : 74 - 89
  • [3] A Semi-supervised Fuzzy Co-clustering Framework and Application to Twitter Data Analysis
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    Takahashi, Norimitsu
    Ishikawa, Yutaka
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [4] Constraint Co-Projections for Semi-Supervised Co-Clustering
    Huang, Shudong
    Wang, Hongjun
    Li, Tao
    Yang, Yan
    Li, Tianrui
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 3047 - 3058
  • [5] Orthogonal Nonnegative Matrix Tri-factorization for Semi-supervised Document Co-clustering
    Ma, Huifang
    Zhao, Weizhong
    Tan, Qing
    Shi, Zhongzhi
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 189 - +
  • [6] Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization
    Chen, Yanhua
    Wang, Lijun
    Dong, Ming
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 211 - 226
  • [7] A Semi-Supervised Framework for MMMs-Induced Fuzzy Co-Clustering with Virtual Samples
    Tanaka, Daiji
    Honda, Katsuhiro
    Ubukata, Seiki
    Notsu, Akira
    [J]. ADVANCES IN FUZZY SYSTEMS, 2016, 2016
  • [8] A Kernel Probabilistic Model for Semi-supervised Co-clustering Ensemble
    Zhang, Yinghui
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 143 - 153
  • [9] Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering
    Meng, Lei
    Tan, Ah-Hwee
    Xu, Dong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (09) : 2293 - 2306
  • [10] Semi-supervised Co-Clustering on Attributed Heterogeneous Information Networks
    Ji, Yugang
    Shi, Chuan
    Fang, Yuan
    Kong, Xiangnan
    Yin, Mingyang
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)