Scalable Overlapping Co-Clustering of Word-Document Data

被引:6
|
作者
de Franca, Fabricio Olivetti [1 ]
机构
[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil
关键词
co-clustering; text clustering; hashing;
D O I
10.1109/ICMLA.2012.84
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.
引用
收藏
页码:464 / 467
页数:4
相关论文
共 50 条
  • [31] Adaptive Spectral Co-clustering for Multiview Data
    Son, Jeong-Woo
    Jeon, Junekey
    Lee, Sang-Yun
    Kim, Sun-Joong
    2016 18TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - INFORMATION AND COMMUNICATIONS FOR SAFE AND SECURE LIFE, 2016, : 447 - 450
  • [32] Semi-supervised fuzzy co-clustering algorithm for document categorization
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (01) : 55 - 74
  • [33] Semi-supervised fuzzy co-clustering algorithm for document categorization
    Yang Yan
    Lihui Chen
    William-Chandra Tjhi
    Knowledge and Information Systems, 2013, 34 : 55 - 74
  • [34] Hard and fuzzy diagonal co-clustering for document-term partitioning
    Laclau, Charlotte
    Nadif, Mohamed
    NEUROCOMPUTING, 2016, 193 : 133 - 147
  • [35] SPARSITY-COGNIZANT OVERLAPPING CO-CLUSTERING FOR BEHAVIOR INFERENCE IN SOCIAL NETWORKS
    Zhu, Hao
    Mateos, Gonzalo
    Giannakis, Georgios B.
    Sidiropoulos, Nicholas D.
    Banerjee, Arindam
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 3534 - 3537
  • [36] Bayesian Co-clustering
    Shan, Hanhuai
    Banerjee, Arindam
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 530 - 539
  • [37] A Survey of Co-Clustering
    Wang, Hongjun
    Song, Yi
    Chen, Wei
    Luo, Zhipeng
    Li, Chongshou
    Li, Tianrui
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [38] Directional co-clustering
    Aghiles Salah
    Mohamed Nadif
    Advances in Data Analysis and Classification, 2019, 13 : 591 - 620
  • [39] Co-Clustering on Manifolds
    Gu, Quanquan
    Zhou, Jie
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 359 - 367
  • [40] Directional co-clustering
    Salah, Aghiles
    Nadif, Mohamed
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) : 591 - 620