Constrained Co-Clustering for Textual Documents

被引:0
|
作者
Song, Yangqiu [1 ]
Pan, Shimei [2 ]
Liu, Shixia [1 ]
Wei, Furu [1 ]
Zhou, Michelle X. [3 ]
Qian, Weihong [1 ]
机构
[1] IBM Res China, Beijing, Peoples R China
[2] IBM Res TJ Watson Ctr, Hawthorne, NY USA
[3] IBM Res Almaden Ctr, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.
引用
收藏
页码:581 / 586
页数:6
相关论文
共 50 条
  • [1] Fuzzy co-clustering of web documents
    William-Chandra, T
    Chen, L
    2005 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2005, : 545 - 551
  • [2] Fuzzy co-clustering of documents and keywords
    Kurnmamuru, K
    Dhawale, A
    Krishnapuram, R
    PROCEEDINGS OF THE 12TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1 AND 2, 2003, : 772 - 777
  • [3] Coverage Constrained Spatial CO-clustering
    Ohriniuc, Roxana
    Reich, Aaron
    Yang, KwangSoo
    26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 492 - 495
  • [4] Clustering Sentiment Phrases in Product Reviews by Constrained Co-clustering
    Cao, Yujie
    Huang, Minlie
    Zhu, Xiaoyan
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 79 - 89
  • [5] Co-clustering WSDL Documents to Bootstrap Service Discovery
    Liang, Tingting
    Chen, Liang
    Ying, Haochao
    Wu, Jian
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA), 2014, : 215 - 222
  • [6] Using Topic and Subjectivity Analysis for Overlapped Co-Clustering Documents
    Huang, Jih-Jeng
    2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 105 - 108
  • [7] Co-clustering based Classification for Out-of-domain Documents
    Dai, Wenyuan
    Xue, Gui-Rong
    Yang, Qiang
    Yu, Yong
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 210 - +
  • [8] Fuzzy semi-supervised co-clustering for text documents
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    FUZZY SETS AND SYSTEMS, 2013, 215 : 74 - 89
  • [9] The latent topic block model for the co-clustering of textual interaction data
    Berge, Laurent R.
    Bouveyron, Charles
    Corneli, Marco
    Latouche, Pierre
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 137 : 247 - 270
  • [10] Co-clustering documents and words using bipartite isoperimetric graph partitioning
    Rege, Manjeet
    Dong, Ming
    Fotouhi, Farshad
    ICDM 2006: Sixth International Conference on Data Mining, Proceedings, 2006, : 532 - 541