Scalable Overlapping Co-Clustering of Word-Document Data

被引:6
|
作者
de Franca, Fabricio Olivetti [1 ]
机构
[1] Fed Univ ABC UFABC, CMCC, R Santa Adelia 166, BR-09210170 Santo Andre, Brazil
关键词
co-clustering; text clustering; hashing;
D O I
10.1109/ICMLA.2012.84
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.
引用
收藏
页码:464 / 467
页数:4
相关论文
共 50 条
  • [1] Scalable and interpretable product recommendations via overlapping co-clustering
    Heckel, Reinhard
    Vlachos, Michail
    Parnell, Thomas
    Duenner, Celestine
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1033 - 1044
  • [2] Diagonal Co-clustering Algorithm for Document-Word Partitioning
    Laclau, Charlotte
    Nadif, Mohamed
    ADVANCES IN INTELLIGENT DATA ANALYSIS XIV, 2015, 9385 : 170 - 180
  • [3] Scalable Co-clustering Algorithms
    Kwon, Bongjune
    Cho, Hyuk
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 32 - +
  • [4] Text sentiment classification based on a genetic algorithm and word and document co-clustering
    E. V. Kotelnikov
    M. V. Pletneva
    Journal of Computer and Systems Sciences International, 2016, 55 : 106 - 114
  • [5] Text sentiment classification based on a genetic algorithm and word and document co-clustering
    Kotelnikov, E. V.
    Pletneva, M. V.
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2016, 55 (01) : 106 - 114
  • [6] Non-Exhaustive, Overlapping Co-Clustering
    Whang, Joyce Jiyoung
    Dhillon, Inderjit S.
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2367 - 2370
  • [7] Scalable Ensemble Information-Theoretic Co-clustering for Massive Data
    Huang, Qizhen
    Chen, Xiaojun
    Huang, Joshua Zhexue
    Feng, Shengzhong
    Fan, Jianping
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 563 - 568
  • [8] Joint co-clustering: Co-clustering of genomic and clinical bioimaging data
    Ficarra, Elisa
    De Micheli, Giovanni
    Yoon, Sungroh
    Benini, Luca
    Macii, Enrico
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 938 - 949
  • [9] I/O scalable Bregman co-clustering
    Hsu, Kuo-Wei
    Banerjee, Arindam
    Srivastava, Jaideep
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 896 - 903
  • [10] Hierarchical and Overlapping Co-Clustering of mRNA: miRNA Interactions
    Pio, Gianvito
    Ceci, Michelangelo
    Loglisci, Corrado
    D'Elia, Domenica
    Malerba, Donato
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 654 - +