Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation

被引:0
|
作者
Stefanovitch, Nicolas [1 ]
Jacquet, Guillaume [1 ,2 ]
de Longueville, Bertrand [1 ]
机构
[1] European Commiss, Joint Res Ctr, Rome, Italy
[2] European Union, Publicat Off, Alicante, Spain
关键词
graph clustering; multilingual embeddings; semantic similarity; topic detection; public consultation;
D O I
10.1145/3543873.3587627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
引用
收藏
页码:694 / 700
页数:7
相关论文
共 50 条
  • [21] An Integrated Approach for Multilingual Scene Text Detection
    Liao, Wen-Hung
    Liang, Yi-Hsuan
    Wu, Yi-Chieh
    [J]. PROCEEDINGS OF THE 2015 SEVENTH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2015), 2015, : 211 - 217
  • [22] An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering
    Dudarin, Pavel V.
    Yarushkina, Nadezhda G.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL SCIENTIFIC CONFERENCE INTELLIGENT INFORMATION TECHNOLOGIES FOR INDUSTRY (IITI'17), VOL 1, 2018, 679 : 295 - 304
  • [23] A Robust Multilingual Text Detection Approach Based on Transforms and Wavelet Entropy
    Aradhya, V. N. Manjunath
    Pavithra, M. S.
    Naveena, C.
    [J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT-2012), 2012, 4 : 232 - 237
  • [25] Improved graph node embedding and clustering method for fault short text
    Qiu J.
    Sun L.
    Han M.
    [J]. Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (12): : 4257 - 4266
  • [26] Elastic deep autoencoder for text embedding clustering by an improved graph regularization
    Daneshfar, Fatemeh
    Soleymanbaigi, Sayvan
    Nafisi, Ali
    Yamini, Pedram
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [27] Attributed Graph Clustering: A Deep Attentional Embedding Approach
    Wang, Chun
    Pan, Shirui
    Hu, Ruiqi
    Long, Guodong
    Jiang, Jing
    Zhang, Chengqi
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3670 - 3676
  • [28] An efficient graph embedding clustering approach for heterogeneous network
    Sajjadi, Zahra Sadat
    Esmaeili, Mahdi
    Ghobaei-Arani, Mostafa
    Minaei-Bidgoli, Behrouz
    [J]. JOURNAL OF SUPERCOMPUTING, 2024, 80 (13): : 19562 - 19591
  • [29] Motif-based embedding for graph clustering
    Lim, Sungsu
    Lee, Jae-Gil
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2016,
  • [30] Subspace clustering based on alignment and graph embedding
    Liao, Mengmeng
    Gu, Xiaodong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 188