Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora

被引:3
|
作者
Steuber, Florian [1 ]
Schneider, Sinclair [1 ]
Schoenfeld, Mirco [2 ]
机构
[1] Univ Bundeswehr Munchen, Res Inst CODE, Neubiberg, Germany
[2] Univ Bayreuth, Bayreuth, Germany
关键词
Topic modeling; Short text; Word embedding; Transfer learning; Big data;
D O I
10.1016/j.bdr.2021.100293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents on the social media platform Twitter are formulated in short and simple style, instead of being written extensively and elaborately. Further, the core message of a post is often encoded into characteristic phrases called hashtags. These hashtags illustrate the semantics of a post or tie it to a specific topic. In this paper, we propose multiple approaches of using hashtags and their surrounding texts to improve topic modeling of short texts. We use transfer learning by applying a pre-trained word embedding of hashtags to derive preliminary topics. These function as supervising information, or seed topics and are passed to Archetypal LDA (A-LDA), a recent variant of Latent Dirichlet Allocation. We demonstrate the effectiveness of our approach using a large corpus of posts exemplarily on Twitter. Our approaches improve the topic model's qualities in terms of various quantitative metrics. Moreover, the presented algorithms used to extract seed topics can be utilized as form of lightweight topic model by themselves. Hence, our approaches create additional analytical opportunities and can help to gain a more detailed understanding of what people are talking about on social media. By using big data in terms of millions of tweets for preprocessing and fine-tuning, we enable the classification algorithm to produce topics that are very coherent to the reader. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information
    Voskergian, Daniel
    Bakir-Gungor, Burcu
    Yousef, Malik
    FRONTIERS IN GENETICS, 2023, 14
  • [22] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
    Tang, Xianlun
    Luo, Yang
    Xiong, Deyi
    Yang, Jingming
    Li, Rui
    Peng, Deguang
    APPLIED INTELLIGENCE, 2022, 52 (13) : 15632 - 15642
  • [23] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
    Xianlun Tang
    Yang Luo
    Deyi Xiong
    Jingming Yang
    Rui Li
    Deguang Peng
    Applied Intelligence, 2022, 52 : 15632 - 15642
  • [24] Supervised Intensive Topic Models for Emotion Detection over Short Text
    Rao, Yanghui
    Pang, Jianhui
    Xie, Haoran
    Liu, An
    Wong, Tak-Lam
    Li, Qing
    Wang, Fu Lee
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT I, 2017, 10177 : 408 - 422
  • [25] TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
    Gretarsson, Brynjar
    O'Donovan, John
    Bostandjiev, Svetlin
    Hoellerer, Tobias
    Asuncion, Arthur
    Newman, David
    Smyth, Padhraic
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [26] Dynamic Semantic Network Analysis of Unstructured Text Corpora
    Kharlamov, Alexander
    Gradoselskaya, Galina
    Dokuka, Sofia
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2017, 2018, 10716 : 392 - 403
  • [27] Improving Medical Short Text Classification with Semantic Expansion UsingWord-Cluster Embedding
    Shen, Ying
    Zhang, Qiang
    Zhang, Jin
    Huang, Jiyue
    Lu, Yuming
    Lei, Kai
    INFORMATION SCIENCE AND APPLICATIONS 2018, ICISA 2018, 2019, 514 : 401 - 411
  • [28] Extracting semantic representations from large text corpora
    Patel, M
    Bullinaria, JA
    Levy, JP
    4TH NEURAL COMPUTATION AND PSYCHOLOGY WORKSHOP, LONDON, 9-11 APRIL 1997: CONNECTIONIST REPRESENTATIONS, 1997, : 199 - 212
  • [29] Semantic Text Alignment based on Topic Modeling
    Le, Huong T.
    Pham, Lam N.
    Nguyen, Duy D.
    Nguyen, Son V.
    Nguyen, An N.
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 67 - 72
  • [30] A systematic review of the use of topic models for short text social media analysis
    Caitlin Doogan Poet Laureate
    Wray Buntine
    Henry Linger
    Artificial Intelligence Review, 2023, 56 : 14223 - 14255