Scalable Overlapping Co-Clustering of Word-Document Data

被引：6

作者：

de Franca, Fabricio Olivetti ^{[1
]}

机构：

[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil

来源：

2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1 | 2012年

关键词：

co-clustering; text clustering; hashing;

D O I：

10.1109/ICMLA.2012.84

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.

引用

页码：464 / 467

页数：4

共 50 条

[21] Sleeved co-clustering of lagged data
Eran Shaham
David Sarne
Boaz Ben-Moshe
Knowledge and Information Systems, 2012, 31 : 251 - 279
[22] Co-clustering of fuzzy lagged data
Eran Shaham
David Sarne
Boaz Ben-Moshe
Knowledge and Information Systems, 2015, 44 : 217 - 252
[23] Co-clustering of fuzzy lagged data
Shaham, Eran
Sarne, David
Ben-Moshe, Boaz
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 217 - 252
[24] Discovering Overlapping Community Structure In Networks through Co-clustering
Kakkar, Sahil
Beniwal, Sunita
2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 1, 2016, : 336 - 341
[25] Co-clustering Sentences and Terms for Multi-document Summarization
Xia, Yunqing
Zhang, Yonggang
Yao, Jianmin
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 339 - +
[26] A new fuzzy co-clustering algorithm for categorization of datasets with overlapping clusters
Tjhi, William-Chandra
Chen, Lihui
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2006, 4093 : 328 - 339
[27] Co-clustering for Binary Data with Maximum Modularity
Labiod, Lazhar
Nadif, Mohamed
NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 700 - 708
[28] CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA
Sottosanti, Andrea
Risso, Davide
ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1444 - 1468
[29] CO-CLUSTERING SEPARATELY EXCHANGEABLE NETWORK DATA
Choi, David
Wolfe, Patrick J.
ANNALS OF STATISTICS, 2014, 42 (01): : 29 - 63
[30] A fuzzy co-clustering algorithm for biomedical data
Liu, Yongli
Wu, Shuai
Liu, Zhizhong
Chao, Hao
PLOS ONE, 2017, 12 (04):

← 1 2 3 4 5 →