Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora

被引:3
|
作者
Steuber, Florian [1 ]
Schneider, Sinclair [1 ]
Schoenfeld, Mirco [2 ]
机构
[1] Univ Bundeswehr Munchen, Res Inst CODE, Neubiberg, Germany
[2] Univ Bayreuth, Bayreuth, Germany
关键词
Topic modeling; Short text; Word embedding; Transfer learning; Big data;
D O I
10.1016/j.bdr.2021.100293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents on the social media platform Twitter are formulated in short and simple style, instead of being written extensively and elaborately. Further, the core message of a post is often encoded into characteristic phrases called hashtags. These hashtags illustrate the semantics of a post or tie it to a specific topic. In this paper, we propose multiple approaches of using hashtags and their surrounding texts to improve topic modeling of short texts. We use transfer learning by applying a pre-trained word embedding of hashtags to derive preliminary topics. These function as supervising information, or seed topics and are passed to Archetypal LDA (A-LDA), a recent variant of Latent Dirichlet Allocation. We demonstrate the effectiveness of our approach using a large corpus of posts exemplarily on Twitter. Our approaches improve the topic model's qualities in terms of various quantitative metrics. Moreover, the presented algorithms used to extract seed topics can be utilized as form of lightweight topic model by themselves. Hence, our approaches create additional analytical opportunities and can help to gain a more detailed understanding of what people are talking about on social media. By using big data in terms of millions of tweets for preprocessing and fine-tuning, we enable the classification algorithm to produce topics that are very coherent to the reader. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [2] Integrating Text Classification into Topic Discovery Using Semantic Embedding Models
    Lezama-Sanchez, Ana Laura
    Vidal, Mireya Tovar
    Reyes-Ortiz, Jose A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [3] Spatial Temporal Topic Embedding: A Semantic Modeling Method for Short Text in Social Network
    Yang, Congxian
    Du, Junping
    Kou, Feifei
    Lee, Jangmyung
    ARTIFICIAL INTELLIGENCE (ICAI 2018), 2018, 888 : 198 - 210
  • [4] A Comparative Study of Methods for Visualizable Semantic Embedding of Small Text Corpora
    Choudhary, Rishabh
    Doboli, Simona
    Minai, Ali A.
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Learning Author-Topic Models from Text Corpora
    Rosen-Zvi, Michal
    Chemudugunta, Chaitanya
    Griffiths, Thomas
    Smyth, Padhraic
    Steyvers, Mark
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (01)
  • [6] Semantic Augmented Topic Model over Short Text
    Li, Lingyun
    Sun, Yawei
    Wang, Cong
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 652 - 656
  • [7] Unsupervised Anomaly Detection in Multi-Topic Short-Text Corpora
    Ait-Saada, Mira
    Nadif, Mohamed
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1392 - 1403
  • [8] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [9] Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
    Zhang, Peng
    Wang, Suge
    Li, Deyu
    Li, Xiaoli
    Xu, Zhikang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (12) : 2322 - 2335
  • [10] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844