Scalable Overlapping Co-Clustering of Word-Document Data

被引:6
|
作者
de Franca, Fabricio Olivetti [1 ]
机构
[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil
关键词
co-clustering; text clustering; hashing;
D O I
10.1109/ICMLA.2012.84
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.
引用
收藏
页码:464 / 467
页数:4
相关论文
共 50 条
  • [21] Sleeved co-clustering of lagged data
    Eran Shaham
    David Sarne
    Boaz Ben-Moshe
    Knowledge and Information Systems, 2012, 31 : 251 - 279
  • [22] Co-clustering of fuzzy lagged data
    Eran Shaham
    David Sarne
    Boaz Ben-Moshe
    Knowledge and Information Systems, 2015, 44 : 217 - 252
  • [23] Co-clustering of fuzzy lagged data
    Shaham, Eran
    Sarne, David
    Ben-Moshe, Boaz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 217 - 252
  • [24] Discovering Overlapping Community Structure In Networks through Co-clustering
    Kakkar, Sahil
    Beniwal, Sunita
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 1, 2016, : 336 - 341
  • [25] Co-clustering Sentences and Terms for Multi-document Summarization
    Xia, Yunqing
    Zhang, Yonggang
    Yao, Jianmin
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 339 - +
  • [26] A new fuzzy co-clustering algorithm for categorization of datasets with overlapping clusters
    Tjhi, William-Chandra
    Chen, Lihui
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2006, 4093 : 328 - 339
  • [27] Co-clustering for Binary Data with Maximum Modularity
    Labiod, Lazhar
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 700 - 708
  • [28] CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA
    Sottosanti, Andrea
    Risso, Davide
    ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1444 - 1468
  • [29] CO-CLUSTERING SEPARATELY EXCHANGEABLE NETWORK DATA
    Choi, David
    Wolfe, Patrick J.
    ANNALS OF STATISTICS, 2014, 42 (01): : 29 - 63
  • [30] A fuzzy co-clustering algorithm for biomedical data
    Liu, Yongli
    Wu, Shuai
    Liu, Zhizhong
    Chao, Hao
    PLOS ONE, 2017, 12 (04):