Scalable Overlapping Co-Clustering of Word-Document Data

被引:6
|
作者
de Franca, Fabricio Olivetti [1 ]
机构
[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil
关键词
co-clustering; text clustering; hashing;
D O I
10.1109/ICMLA.2012.84
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.
引用
收藏
页码:464 / 467
页数:4
相关论文
共 50 条
  • [41] Bayesian co-clustering
    Domeniconi, Carlotta
    Laskey, Kathryn
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (05) : 347 - 356
  • [42] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [43] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [44] Bipartite isoperimetric graph partitioning for data co-clustering
    Rege, Manjeet
    Dong, Ming
    Fotouhi, Farshad
    DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (03) : 276 - 312
  • [45] CFOND: Consensus Factorization for Co-Clustering Networked Data
    Guo, Ting
    Pan, Shirui
    Zhu, Xingquan
    Zhang, Chengqi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) : 706 - 719
  • [46] A New Framework for Co-clustering of Gene Expression Data
    Zhang, Shuzhong
    Wang, Kun
    Chen, Bilian
    Huang, Xiuzhen
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 1 - +
  • [47] Bipartite isoperimetric graph partitioning for data co-clustering
    Manjeet Rege
    Ming Dong
    Farshad Fotouhi
    Data Mining and Knowledge Discovery, 2008, 16 : 276 - 312
  • [48] Subspace Weighting Co-Clustering of Gene Expression Data
    Chen, Xiaojun
    Huang, Joshua Z.
    Wu, Qingyao
    Yang, Min
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 352 - 364
  • [49] Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling
    Hatano, Daisuke
    Fukunaga, Takuro
    Maehara, Takanori
    Kawarabayashi, Ken-ichi
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1992 - 1999
  • [50] Scalable co-Clustering using a Crossing Minimization - Application to Production Flow Analysis
    Pigler, Csaba
    Fogarassy-Vathy, Agnes
    Abonyi, Janos
    ACTA POLYTECHNICA HUNGARICA, 2016, 13 (02) : 209 - 228