Active Learning of Constraints for Semi-supervised Text Clustering

被引:0
|
作者
Huang, Ruizhang [1 ]
Lam, Wai [1 ]
Zhang, Zhigang [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates active learning of constraints for semi-supervised document clustering. We make use of the intermediate clustering results to guide the document pair selection for obtaining user judgments for constraint generation. A gain function is designed for choosing the most informative document pairs given the current cluster assignments. This gain function measures how much we can learn by revealing the judgment of the document pairs. Two methods are investigated, namely, independent gain model and dependent gain model. In the independent gain model, we assume that the information learned by revealing the judgment of a document pair is independent of revealing the judgment of other document pairs. The dependent gain model also considers previously chosen documents to avoid redundant selection and maximize the gain collectively for a set of document. pairs. Constrained semi-supervised clustering and gain directed document pair selection are conducted in an iterative manner. We have conducted extensive experiments on several real-world corpora. The results demonstrate that the intermediate clustering assignments and the interactions among a set of document pairs are useful for improving the clustering performance. Our approach is also superior to a recent existing work for this problem.
引用
收藏
页码:113 / 124
页数:12
相关论文
共 50 条
  • [21] An active learning framework for semi-supervised document clustering with language modeling
    Huang, Ruizhang
    Lam, Wai
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (01) : 49 - 67
  • [22] An improvement of collaborative fuzzy clustering based on active semi-supervised learning
    Dinh Sinh Mai
    Trong Hop Dang
    2022 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2022,
  • [23] Active Query Selection for Semi-supervised Clustering
    Mallapragada, Pavan Kumar
    Jin, Rong
    Jain, Anil K.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2376 - 2379
  • [24] Tackling Noise in Active Semi-supervised Clustering
    Soenen, Jonas
    Dumancic, Sebastijan
    Van Craenendonck, Toon
    Blockeel, Hendrik
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT II, 2021, 12458 : 121 - 136
  • [25] Use of Distributed Semi-Supervised Clustering for Text Classification
    Li, Pei
    Deng, Ze
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (08)
  • [26] Text classification with enhanced semi-supervised fuzzy clustering
    Keswani, G
    Hall, LO
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 621 - 626
  • [27] Semi-Supervised Semantic Dynamic Text Clustering Algorithm
    Qian Z.-S.
    Huang R.-Z.
    Wei Q.
    Qin Y.-B.
    Chen Y.-P.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (06): : 803 - 808
  • [28] Semi-Supervised Maximum Margin Clustering with Pairwise Constraints
    Zeng, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (05) : 926 - 939
  • [29] Effective semi-supervised graph clustering with pairwise constraints
    Chen, Jingwei
    Xie, Shiyu
    Yang, Hui
    Nie, Feiping
    INFORMATION SCIENCES, 2024, 681
  • [30] An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints
    Yi, Jinfeng
    Zhang, Lijun
    Yang, Tianbao
    Liu, Wei
    Wang, Jun
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1405 - 1414