Incorporating Embedding to Topic Modeling for More Effective Short Text Analysis

被引:1
|
作者
Rashid, Junaid [1 ]
Kim, Jungeun [2 ]
Naseem, Usman [3 ]
机构
[1] Sejong Univ, Dept Data Sci, Seoul, South Korea
[2] Kongju Natl Univ, Dept Software, Cheonan, South Korea
[3] Univ Sydney, Sch Comp Sci, Sydney, NSW, Australia
基金
新加坡国家研究基金会;
关键词
Topic Modeling; Clustering; Short Text; Classification; Coherence;
D O I
10.1145/3543873.3587316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing abundance of short text content on websites, analyzing and comprehending these short texts has become a crucial task. Topic modeling is a widely used technique for analyzing short text documents and uncovering the underlying topics. However, traditional topic models face difficulties in accurately extracting topics from short texts due to limited content and their sparse nature. To address these issues, we propose an Embedding-based topic modeling (EmTM) approach that incorporates word embedding and hierarchical clustering to identify significant topics. Experimental results demonstrate the effectiveness of EmTM on two datasets comprising web short texts, Snippet and News. The results indicate a superiority of EmTM over baseline topic models by its exceptional performance in both classification accuracy and topic coherence metrics.
引用
收藏
页码:73 / 76
页数:4
相关论文
共 50 条
  • [41] Dynamic topic modeling via self-aggregation for short text streams
    Shi, Lei
    Du, Junping
    Liang, Meiyu
    Kou, Feifei
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2019, 12 (05) : 1403 - 1417
  • [42] Topic Modeling of Short Texts: A Pseudo-Document View With Word Embedding Enhancement
    Zuo, Yuan
    Li, Congrui
    Lin, Hao
    Wu, Junjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 972 - 985
  • [43] TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
    Gretarsson, Brynjar
    O'Donovan, John
    Bostandjiev, Svetlin
    Hoellerer, Tobias
    Asuncion, Arthur
    Newman, David
    Smyth, Padhraic
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [44] Incorporating Entities in News Topic Modeling
    Hu, Linmei
    Li, Juanzi
    Li, Zhihui
    Shao, Chao
    Li, Zhixing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 139 - 150
  • [45] Effective Seed-Guided Topic Labeling for Dataless Hierarchical Short Text Classification
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Shi, Wandong
    Guo, Wenli
    Zhang, Jiawen
    WEB ENGINEERING, ICWE 2021, 2021, 12706 : 271 - 285
  • [46] A short text sentiment-topic model for product review analysis
    Xiong S.-F.
    Ji D.-H.
    Zidonghua Xuebao/Acta Automatica Sinica, 2016, 42 (08): : 1227 - 1237
  • [47] Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM
    Zhang P.
    Liu D.
    Data Analysis and Knowledge Discovery, 2019, 3 (03) : 95 - 101
  • [48] A Survey of Topic Modeling in Text Mining
    Alghamdi, Rubayyi
    Alfalqi, Khalid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (01) : 147 - 153
  • [49] Collaborative Topic Modeling for Text Tensors
    Ding, Weifeng
    Zheng, Xiaolin
    Chen, Chaochao
    Yu, Zukun
    Chen, Deren
    2014 IEEE 11TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2014, : 89 - 96
  • [50] Text segmentation: A topic modeling perspective
    Misra, Hemant
    Yvon, Francois
    Cappe, Olivier
    Jose, Joemon
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (04) : 528 - 544