Textual data summarization using the Self-Organized Co-Clustering model

被引:10
|
作者
Selosse, Margot [1 ]
Jacques, Julien [1 ]
Biernacki, Christophe [2 ,3 ]
机构
[1] Univ Lyon, Lyon & ERIC EA3083 2, 5 Ave Pierre Mendes, Bron 69500, France
[2] Univ Lille, UFR Math, Cite Sci, Villeneuve Dascq 59655, France
[3] INRIA, 40 Av Halley,Bat A,Pk Plaza, Villeneuve Dascq 59650, France
关键词
Co-Clustering; Document-term matrix; Latent block model; LATENT BLOCK MODEL; FACTORIZATION; MATRIX;
D O I
10.1016/j.patcog.2020.107315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Self-organized Clustering of Square Objects by Multiple Robots
    Song, Yong
    Kim, Jung-Hwan
    Shell, Dylan A.
    SWARM INTELLIGENCE (ANTS 2012), 2012, 7461 : 308 - 315
  • [32] Clustering with multilayer perceptrons and self-organized (Hebbian) learning
    Filho, Jugurta R. Montalvao
    Freire, Eduardo O.
    Bezerra, Murilo A., Jr.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2007, 18 (05) : 501 - 511
  • [33] Co-clustering for Binary Data with Maximum Modularity
    Labiod, Lazhar
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 700 - 708
  • [34] SELF-ORGANIZED CLUSTERING FOR FEATURE MAPPING IN LANGUAGE RECOGNITION
    You, Chang Huai
    Lee, Kong Aik
    Ma, Bin
    Li, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 177 - 180
  • [35] CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA
    Sottosanti, Andrea
    Risso, Davide
    ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1444 - 1468
  • [36] CO-CLUSTERING SEPARATELY EXCHANGEABLE NETWORK DATA
    Choi, David
    Wolfe, Patrick J.
    ANNALS OF STATISTICS, 2014, 42 (01): : 29 - 63
  • [37] Spatio-temporal climate regionalization using a self-organized clustering approach
    Chidean, Mihaela, I
    Caannano, Antonio J.
    Casanova-Mateo, Carlos
    Ramiro-Bargueno, Julio
    Salcedo-Sanz, Sancho
    THEORETICAL AND APPLIED CLIMATOLOGY, 2020, 140 (3-4) : 927 - 949
  • [38] Spatio-temporal climate regionalization using a self-organized clustering approach
    Mihaela I. Chidean
    Antonio J. Caamaño
    Carlos Casanova-Mateo
    Julio Ramiro-Bargueño
    Sancho Salcedo-Sanz
    Theoretical and Applied Climatology, 2020, 140 : 927 - 949
  • [39] Lossless Compression of Data Tables in Mobile Devices using Co-clustering
    Han, B.
    Li, B.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2016, 11 (06) : 776 - 788
  • [40] A fuzzy co-clustering algorithm for biomedical data
    Liu, Yongli
    Wu, Shuai
    Liu, Zhizhong
    Chao, Hao
    PLOS ONE, 2017, 12 (04):