Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation

被引:0
|
作者
Stefanovitch, Nicolas [1 ]
Jacquet, Guillaume [1 ,2 ]
de Longueville, Bertrand [1 ]
机构
[1] European Commiss, Joint Res Ctr, Rome, Italy
[2] European Union, Publicat Off, Alicante, Spain
关键词
graph clustering; multilingual embeddings; semantic similarity; topic detection; public consultation;
D O I
10.1145/3543873.3587627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
引用
收藏
页码:694 / 700
页数:7
相关论文
共 50 条
  • [31] Controversy Detection: A Text and Graph Neural Network Based Approach
    Benslimane, Samy
    Aze, Jerome
    Bringay, Sandra
    Servajean, Maximilien
    Mollevi, Caroline
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT I, 2021, 13080 : 339 - 354
  • [32] Topic Based Temporal Generative Short Text Clustering
    Smitha, E. S.
    Sendhilkumar, S.
    Mahalakshmi, G. S.
    Sanju, S. Krithika
    [J]. PROCEEDING OF THE INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS, BIG DATA AND IOT (ICCBI-2018), 2020, 31 : 912 - 922
  • [33] A Model-based Approach for Text Clustering with Outlier Detection
    Yin, Jianhua
    Wang, Jianyong
    [J]. 2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 625 - 636
  • [34] Graph based AHC Algorithm for Text Clustering
    Jo, Taeho
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 309 - 314
  • [35] Text clustering algorithm based on lexical graph
    Sha, Yun
    Zhang, Guoying
    Jiang, Huina
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 277 - 281
  • [36] Graph Clustering through Attribute Statistics Based Embedding
    Gibert, Jaume
    Valveny, Ernest
    Bunke, Horst
    Brun, Luc
    [J]. COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 302 - 309
  • [37] Large Graph Clustering Using DCT-Based Graph Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD), 2014, : 108 - 111
  • [38] Joint Image-Text News Topic Detection and Tracking by Multimodal Topic And-Or Graph
    Li, Weixin
    Joo, Jungseock
    Qi, Hang
    Zhu, Song-Chun
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (02) : 367 - 381
  • [39] Topic Change Detection on Dialog Based Text
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [40] An ensemble clustering approach for topic discovery using implicit text segmentation
    Memon, Muhammad Qasim
    Lu, Yu
    Chen, Penghe
    Memon, Aasma
    Pathan, Muhammad Salman
    Zardari, Zulfiqar Ali
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 431 - 457