PSLDA: a novel supervised pseudo document-based topic model for short texts

被引:4
|
作者
Sun, Mingtao [1 ]
Zhao, Xiaowei [2 ]
Lin, Jingjing [3 ]
Jing, Jian [2 ]
Wang, Deqing [2 ]
Jia, Guozhu [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
关键词
supervised topic model; short text; pseudo-document;
D O I
10.1007/s11704-021-0606-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts
    Maithya, Emmanuel Muthoka
    Nderu, Lawrence
    Njagi, Dennis
    [J]. 2022 IST-AFRICA CONFERENCE, 2022,
  • [32] Robust Word-Network Topic Model for Short Texts
    Wang, Fei
    Liu, Rui
    Zuo, Yuan
    Zhang, Hui
    Zhang, He
    Wu, Junjie
    [J]. 2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 852 - 856
  • [33] A New Sentiment and Topic Model for Short Texts on Social Media
    Xu, Kang
    Huang, Junheng
    Qi, Guilin
    [J]. SEMANTIC TECHNOLOGY, JIST 2017, 2017, 10675 : 183 - 198
  • [34] A Novel Neural Topic Model and Its Supervised Extension
    Cao, Ziqiang
    Li, Sujian
    Liu, Yang
    Li, Wenjie
    Ji, Heng
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2210 - 2216
  • [35] A Document-Based Neural Relevance Model for Effective Clinical Decision Support
    Ran, Yanhua
    He, Ben
    Hui, Kai
    Xu, Jungang
    Sun, Le
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 798 - 804
  • [36] Combined document embedding and hierarchical topic model for social media texts analysis
    Uteuov, Amir
    Kalyuzhnaya, Anna
    [J]. 7TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE ON COMPUTATIONAL SCIENCE, YSC2018, 2018, 136 : 293 - 303
  • [37] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [38] A Topic based Document Relevance Ranking Model
    Gao, Yang
    Xu, Yue
    Li, Yuefeng
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 271 - 272
  • [39] Neural labeled LDA: a topic model for semi-supervised document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    [J]. SOFT COMPUTING, 2021, 25 (23) : 14561 - 14571
  • [40] Neural labeled LDA: a topic model for semi-supervised document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    [J]. Soft Computing, 2021, 25 : 14561 - 14571