A clustering-based topic model using word networks and word embeddings

被引:5
|
作者
Mu, Wenchuan [1 ]
Lim, Kwan Hui [2 ]
Liu, Junhua [2 ,3 ]
Karunasekera, Shanika [4 ]
Falzon, Lucia [4 ]
Harwood, Aaron [4 ]
机构
[1] Singapore Univ Technol & Design, Engn Prod Dev Pillar, Singapore, Singapore
[2] Singapore Univ Technol & Design, Informat Syst Technol & Design Pillar, Singapore, Singapore
[3] Forth AI, Singapore, Singapore
[4] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
关键词
Topic modelling; Clustering; Word embedding; Twitter; Microblogs; Social networks;
D O I
10.1186/s40537-022-00585-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online social networking services like Twitter are frequently used for discussions on numerous topics of interest, which range from mainstream and popular topics (e.g., music and movies) to niche and specialized topics (e.g., politics). Due to the popularity of such services, it is a challenging task to automatically model and determine the numerous discussion topics given the large amount of tweets. Adding on this complexity is the need to identify these topics with the absence of prior knowledge about both the types and number of topics, while having the requirement of the relevant technical expertise to tune the numerous parameters for the various models. To address this challenge, we develop the Clustering-based Topic Modelling (ClusTop) algorithm that first constructs different types of word networks based on different types of n-grams co-occurrence and word embedding distances. Using these word networks, ClusTop is then able to automatically determine the discussion topics using community detection approaches. In contrast to traditional topic models, ClusTop does not require the tuning or setting of numerous parameters and instead uses community detection approaches to automatically determine the appropriate number of topics. The ClusTop algorithm is also able to capture the syntactic meaning in tweets via the use of bigrams, trigrams, other word combinations and word embedding techniques in constructing the word network graph, and utilizes edge weights based on word embedding. Using three Twitter datasets with labelled crises and events as topics, we show that ClusTop outperforms various traditional baselines in terms of topic coherence, pointwise mutual information, precision, recall and F-score.
引用
收藏
页数:38
相关论文
共 50 条
  • [1] A clustering-based topic model using word networks and word embeddings
    Wenchuan Mu
    Kwan Hui Lim
    Junhua Liu
    Shanika Karunasekera
    Lucia Falzon
    Aaron Harwood
    [J]. Journal of Big Data, 9
  • [2] ClusTop: A Clustering-based Topic Modelling Algorithm for Twitter using Word Networks
    Lim, Kwan Hui
    Karunasekera, Shanika
    Harwood, Aaron
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2009 - 2018
  • [3] A Correlated Topic Model Using Word Embeddings
    Xun, Guangxu
    Li, Yaliang
    Zhao, Wayne Xin
    Gao, Jing
    Zhang, Aidong
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4207 - 4213
  • [4] Clustering Search Engine Suggests by Integrating a Topic Model and Word Embeddings
    Nie, Tian
    Ding, Yi
    Zhao, Chen
    Lin, Youchao
    Utsuro, Takchito
    Kawada, Yasuhide
    [J]. 2017 18TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNDP 2017), 2017, : 581 - 586
  • [5] Document Clustering Meets Topic Modeling with Word Embeddings
    Costa, Gianni
    Ortale, Riccardo
    [J]. PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 244 - 252
  • [6] Improving biterm topic model with word embeddings
    Jiajia Huang
    Min Peng
    Pengwei Li
    Zhiwei Hu
    Chao Xu
    [J]. World Wide Web, 2020, 23 : 3099 - 3124
  • [7] A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings
    Hu, Weihua
    Tsujii, Jun'ichi
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 380 - 386
  • [8] Improving biterm topic model with word embeddings
    Huang, Jiajia
    Peng, Min
    Li, Pengwei
    Hu, Zhiwei
    Xu, Chao
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3099 - 3124
  • [9] Topic Modeling Enhancement using Word Embeddings
    Limwattana, Siriwat
    Prom-on, Santitham
    [J]. 2021 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE-2021), 2021,
  • [10] Topic extraction by clustering word embeddings on short online texts
    Nabergoj, David
    D’Alconzo, Alessandro
    Valerio, Danilo
    Štrumbelj, Erik
    [J]. Elektrotehniski Vestnik/Electrotechnical Review, 2022, 89 (1-2): : 64 - 72