PSLDA: a novel supervised pseudo document-based topic model for short texts

被引:4
|
作者
Sun, Mingtao [1 ]
Zhao, Xiaowei [2 ]
Lin, Jingjing [3 ]
Jing, Jian [2 ]
Wang, Deqing [2 ]
Jia, Guozhu [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
关键词
supervised topic model; short text; pseudo-document;
D O I
10.1007/s11704-021-0606-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A Biterm-based Dirichlet Process Topic Model for Short Texts
    Pan, Yali
    Yin, Jian
    Liu, Shaopeng
    Li, Jing
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 301 - 304
  • [22] DOCUMENT-BASED DIRICHLET CLASS LANGUAGE MODEL FOR SPEECH RECOGNITION USING DOCUMENT-BASED N-GRAM EVENTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 42 - 47
  • [23] Topic Modeling for Short Texts via Word Embedding and Document Correlation
    Yi, Feng
    Jiang, Bo
    Wu, Jianjun
    [J]. IEEE ACCESS, 2020, 8 : 30692 - 30705
  • [24] Document-based and Term-based Linear Methods for Pseudo-Relevance Feedback
    Valcarce, Daniel
    Parapar, Javier
    Barreiro, Alvaro
    [J]. APPLIED COMPUTING REVIEW, 2018, 18 (04): : 5 - 17
  • [25] Twin labeled LDA: a supervised topic model for document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    [J]. Applied Intelligence, 2020, 50 : 4602 - 4615
  • [26] Twin labeled LDA: a supervised topic model for document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    [J]. APPLIED INTELLIGENCE, 2020, 50 (12) : 4602 - 4615
  • [27] LDA-BASED CONTEXT DEPENDENT RECURRENT NEURAL NETWORK LANGUAGE MODEL USING DOCUMENT-BASED TOPIC DISTRIBUTION OF WORDS
    Haidar, Md. Akmal
    Kurimo, Mikko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5730 - 5734
  • [28] A Guided Topic-Noise Model for Short Texts
    Churchill, Robert
    Singh, Lisa
    Ryan, Rebecca
    Davis-Kean, Pamela
    [J]. PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 2870 - 2878
  • [29] Enhanced Contextual Neural Topic Model for Short Texts
    Liu, Gang
    Wang, Tongli
    Tang, Hongwei
    Zhan, Kai
    Yang, Wenli
    [J]. Computer Engineering and Applications, 2024, 60 (01) : 154 - 164
  • [30] GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts
    Liang, Wenxin
    Feng, Ran
    Liu, Xinyue
    Li, Yuangang
    Zhang, Xianchao
    [J]. IEEE ACCESS, 2018, 6 : 43612 - 43621