PSLDA: a novel supervised pseudo document-based topic model for short texts

被引:4
|
作者
Sun, Mingtao [1 ]
Zhao, Xiaowei [2 ]
Lin, Jingjing [3 ]
Jing, Jian [2 ]
Wang, Deqing [2 ]
Jia, Guozhu [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
关键词
supervised topic model; short text; pseudo-document;
D O I
10.1007/s11704-021-0606-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts
    Maithya, Emmanuel Muthoka
    Nderu, Lawrence
    Njagi, Dennis
    2022 IST-AFRICA CONFERENCE, 2022,
  • [32] Robust Word-Network Topic Model for Short Texts
    Wang, Fei
    Liu, Rui
    Zuo, Yuan
    Zhang, Hui
    Zhang, He
    Wu, Junjie
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 852 - 856
  • [33] A New Sentiment and Topic Model for Short Texts on Social Media
    Xu, Kang
    Huang, Junheng
    Qi, Guilin
    SEMANTIC TECHNOLOGY, JIST 2017, 2017, 10675 : 183 - 198
  • [34] A Document-Based Neural Relevance Model for Effective Clinical Decision Support
    Ran, Yanhua
    He, Ben
    Hui, Kai
    Xu, Jungang
    Sun, Le
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 798 - 804
  • [35] A Novel Neural Topic Model and Its Supervised Extension
    Cao, Ziqiang
    Li, Sujian
    Liu, Yang
    Li, Wenjie
    Ji, Heng
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2210 - 2216
  • [36] Combined document embedding and hierarchical topic model for social media texts analysis
    Uteuov, Amir
    Kalyuzhnaya, Anna
    7TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE ON COMPUTATIONAL SCIENCE, YSC2018, 2018, 136 : 293 - 303
  • [37] Comparative Analysis between Document-based and Model-based Compliance Management Approaches
    Ghanavati, Sepideh
    Amyot, Daniel
    Peyton, Liam
    RELAW: 2008 REQUIREMENTS ENGINEERING AND LAW, 2008, : 39 - 43
  • [38] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [39] A Topic based Document Relevance Ranking Model
    Gao, Yang
    Xu, Yue
    Li, Yuefeng
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 271 - 272
  • [40] Neural labeled LDA: a topic model for semi-supervised document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    SOFT COMPUTING, 2021, 25 (23) : 14561 - 14571