Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation

被引:0
|
作者
Stefanovitch, Nicolas [1 ]
Jacquet, Guillaume [1 ,2 ]
de Longueville, Bertrand [1 ]
机构
[1] European Commiss, Joint Res Ctr, Rome, Italy
[2] European Union, Publicat Off, Alicante, Spain
关键词
graph clustering; multilingual embeddings; semantic similarity; topic detection; public consultation;
D O I
10.1145/3543873.3587627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
引用
收藏
页码:694 / 700
页数:7
相关论文
共 50 条
  • [41] Large Graph Clustering Using DCT-Based Graph Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD), 2014, : 108 - 111
  • [42] Joint Image-Text News Topic Detection and Tracking by Multimodal Topic And-Or Graph
    Li, Weixin
    Joo, Jungseock
    Qi, Hang
    Zhu, Song-Chun
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (02) : 367 - 381
  • [43] Topic Change Detection on Dialog Based Text
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [44] An ensemble clustering approach for topic discovery using implicit text segmentation
    Memon, Muhammad Qasim
    Lu, Yu
    Chen, Penghe
    Memon, Aasma
    Pathan, Muhammad Salman
    Zardari, Zulfiqar Ali
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 431 - 457
  • [45] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [46] A Topic-based Dynamic Clustering Algorithm for Text Stream
    Rao, Y.
    Li, X. J.
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2015), 2015, 123 : 480 - 483
  • [47] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    [J]. Applied Intelligence, 2022, 52 : 17829 - 17844
  • [48] Corpus-based topic diffusion for short text clustering
    Zheng, Chu Tao
    Liu, Cheng
    Wong, Hau San
    [J]. NEUROCOMPUTING, 2018, 275 : 2444 - 2458
  • [49] A Topic Recognition Method of News Text Based on Word Embedding Enhancement
    Du, Qiming
    Li, Nan
    Liu, Wenfu
    Sun, Daozhu
    Yang, Shudan
    Yue, Feng
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [50] Attributed multiplex graph clustering: A heuristic clustering-aware network embedding approach
    Han, Beibei
    Wei, Yingmei
    Kang, Lai
    Wang, Qingyong
    Feng, Suru
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 592