Scalable Overlapping Co-Clustering of Word-Document Data

被引：6

作者：

de Franca, Fabricio Olivetti ^{[1
]}

机构：

[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil

来源：

2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1 | 2012年

关键词：

co-clustering; text clustering; hashing;

D O I：

10.1109/ICMLA.2012.84

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.

引用

页码：464 / 467

页数：4

共 50 条

[41] Bayesian co-clustering
Domeniconi, Carlotta
Laskey, Kathryn
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (05) : 347 - 356
[42] Model-based co-clustering for ordinal data
Jacques, Julien
Biernacki, Christophe
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
[43] Model-based co-clustering for functional data
Ben Slimen, Yosra
Allio, Sylvain
Jacques, Julien
NEUROCOMPUTING, 2018, 291 : 97 - 108
[44] Bipartite isoperimetric graph partitioning for data co-clustering
Rege, Manjeet
Dong, Ming
Fotouhi, Farshad
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (03) : 276 - 312
[45] CFOND: Consensus Factorization for Co-Clustering Networked Data
Guo, Ting
Pan, Shirui
Zhu, Xingquan
Zhang, Chengqi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) : 706 - 719
[46] A New Framework for Co-clustering of Gene Expression Data
Zhang, Shuzhong
Wang, Kun
Chen, Bilian
Huang, Xiuzhen
PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 1 - +
[47] Bipartite isoperimetric graph partitioning for data co-clustering
Manjeet Rege
Ming Dong
Farshad Fotouhi
Data Mining and Knowledge Discovery, 2008, 16 : 276 - 312
[48] Subspace Weighting Co-Clustering of Gene Expression Data
Chen, Xiaojun
Huang, Joshua Z.
Wu, Qingyao
Yang, Min
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 352 - 364
[49] Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling
Hatano, Daisuke
Fukunaga, Takuro
Maehara, Takanori
Kawarabayashi, Ken-ichi
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1992 - 1999
[50] Scalable co-Clustering using a Crossing Minimization - Application to Production Flow Analysis
Pigler, Csaba
Fogarassy-Vathy, Agnes
Abonyi, Janos
ACTA POLYTECHNICA HUNGARICA, 2016, 13 (02) : 209 - 228

← 1 2 3 4 5 →