Textual data summarization using the Self-Organized Co-Clustering model

被引:10
|
作者
Selosse, Margot [1 ]
Jacques, Julien [1 ]
Biernacki, Christophe [2 ,3 ]
机构
[1] Univ Lyon, Lyon & ERIC EA3083 2, 5 Ave Pierre Mendes, Bron 69500, France
[2] Univ Lille, UFR Math, Cite Sci, Villeneuve Dascq 59655, France
[3] INRIA, 40 Av Halley,Bat A,Pk Plaza, Villeneuve Dascq 59650, France
关键词
Co-Clustering; Document-term matrix; Latent block model; LATENT BLOCK MODEL; FACTORIZATION; MATRIX;
D O I
10.1016/j.patcog.2020.107315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] The latent topic block model for the co-clustering of textual interaction data
    Berge, Laurent R.
    Bouveyron, Charles
    Corneli, Marco
    Latouche, Pierre
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 137 : 247 - 270
  • [2] A self-organized network for data clustering
    Zhao, L
    Damiance, APG
    Carvalho, ACPLF
    ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1189 - 1198
  • [3] Constrained Co-Clustering for Textual Documents
    Song, Yangqiu
    Pan, Shimei
    Liu, Shixia
    Wei, Furu
    Zhou, Michelle X.
    Qian, Weihong
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 581 - 586
  • [4] Joint co-clustering: Co-clustering of genomic and clinical bioimaging data
    Ficarra, Elisa
    De Micheli, Giovanni
    Yoon, Sungroh
    Benini, Luca
    Macii, Enrico
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 938 - 949
  • [5] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [6] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [7] Swarm intelligence for self-organized clustering
    Thrun, Michael C.
    Ultsch, Alfred
    ARTIFICIAL INTELLIGENCE, 2021, 290 (290)
  • [8] Video Summarization Using a Self-Growing and Self-Organized Neural Gas Network
    Papadopoulos, Dim P.
    Chatzichristofis, Savvas A.
    Papamarkos, Nikos
    COMPUTER VISION/COMPUTER GRAPHICS COLLABORATION TECHNIQUES, MIRAGE 2011, 2011, 6930 : 216 - 226
  • [9] Swarm Intelligence for Self-organized Clustering
    Thrun, Michael C.
    Ultsch, Alfred
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5125 - 5129
  • [10] Parallelization of self-organized clustering system
    Islam, Rafiqul
    Miyanaga, Yoshikazu
    Tochinai, Koji
    Neural Network World, 1996, 6 (06): : 921 - 936