Graph and Embedding based Approach for Text Clustering: Topic Detection in a Large Multilingual Public Consultation

被引:0
|
作者
Stefanovitch, Nicolas [1 ]
Jacquet, Guillaume [1 ,2 ]
de Longueville, Bertrand [1 ]
机构
[1] European Commiss, Joint Res Ctr, Rome, Italy
[2] European Union, Publicat Off, Alicante, Spain
关键词
graph clustering; multilingual embeddings; semantic similarity; topic detection; public consultation;
D O I
10.1145/3543873.3587627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel algorithm for multilingual text clustering built upon two well studied techniques: multilingual aligned embedding and community detection in graphs. The aim of our algorithm is to discover underlying topics in a multilingual dataset using clustering. We present both a numerical evaluation using silhouette and V-measure metrics, and a qualitative evaluation for which we propose a new systematic approach. Our algorithm presents robust overall performance and its results were empirically evaluated by an analyst. The work we present was done in the context of a large multilingual public consultation, for which our new algorithm was deployed and used on a daily basis.
引用
收藏
页码:694 / 700
页数:7
相关论文
共 50 条
  • [1] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [2] Clustering Based Topic Events Detection on Text Stream
    Li, Chunshan
    Ye, Yunming
    Zhang, Xiaofeng
    Chu, Dianhui
    Deng, Shengchun
    Xu, Xiaofei
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT 1, 2014, 8397 : 42 - 52
  • [3] A Topic Detection Approach Through Hierarchical Clustering on Concept Graph
    Huang, Xiaohui
    Zhang, Xiaofeng
    Ye, Yunming
    Deng, Shengchun
    Li, Xutao
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (06): : 2285 - 2295
  • [4] Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis
    Huang, Siqi
    Yang, Yitao
    Li, Huakang
    Sun, Guozi
    [J]. 2014 ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC), 2014, : 88 - 92
  • [5] Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
    Bouabdallaoui, Ibrahim
    Guerouate, Fatima
    Sbihi, Mohammed
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
  • [6] A Novel Graph Based Clustering Approach to Document Topic Modeling
    Chanda, Prateek
    Das, Asit Kumar
    [J]. 2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [7] Semantic graph based topic modelling framework for multilingual fake news detection
    Mohawesh, Rami
    Liu, Xiao
    Arini, Hilya Mudrika
    Wu, Yutao
    Yin, Hui
    [J]. AI OPEN, 2023, 4 : 33 - 41
  • [8] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [9] Text clustering as graph community detection
    Mikhina, Elizaveta K.
    Trifalenkov, Vsevolod, I
    [J]. 8TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, BICA 2017 (EIGHTH ANNUAL MEETING OF THE BICA SOCIETY), 2018, 123 : 271 - 277
  • [10] A Network Decomposition-based Text Clustering Algorithm for Topic Detection
    Meng, Zuqiang
    Shen, Shimo
    Chen, Qiulian
    [J]. MEASUREMENT TECHNOLOGY AND ITS APPLICATION, PTS 1 AND 2, 2013, 239-240 : 1318 - 1323