PSLDA: a novel supervised pseudo document-based topic model for short texts

被引:4
|
作者
Sun, Mingtao [1 ]
Zhao, Xiaowei [2 ]
Lin, Jingjing [3 ]
Jing, Jian [2 ]
Wang, Deqing [2 ]
Jia, Guozhu [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
关键词
supervised topic model; short text; pseudo-document;
D O I
10.1007/s11704-021-0606-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Mingtao Sun
    Xiaowei Zhao
    Jingjing Lin
    Jian Jing
    Deqing Wang
    Guozhu Jia
    [J]. Frontiers of Computer Science, 2022, 16
  • [2] PSLDA:a novel supervised pseudo document-based topic model for short texts
    Mingtao SUN
    Xiaowei ZHAO
    Jingjing LIN
    Jian JING
    Deqing WANG
    Guozhu JIA
    [J]. Frontiers of Computer Science., 2022, 16 (06) - 81
  • [3] Topic Modeling of Short Texts: A Pseudo-Document View
    Zuo, Yuan
    Wu, Junjie
    Zhang, Hui
    Lin, Hao
    Wang, Fei
    Xu, Ke
    Xiong, Hui
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2105 - 2114
  • [4] Biterm Pseudo Document Topic Model for Short Text
    Jiang, Lan
    Lu, Hengyang
    Xu, Ming
    Wang, Chongjun
    [J]. 2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 865 - 872
  • [5] Topic Modeling of Short Texts: A Pseudo-Document View With Word Embedding Enhancement
    Zuo, Yuan
    Li, Congrui
    Lin, Hao
    Wu, Junjie
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 972 - 985
  • [6] A Nested Chinese Restaurant Topic Model for Short Texts with Document Embeddings
    Niu, Yue
    Zhang, Hongjie
    Li, Jing
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [7] Authorship Attribution for Short Texts with Author-Document Topic Model
    Zhang, Haowen
    Nie, Peng
    Wen, Yanlong
    Yuan, Xiaojie
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 29 - 41
  • [8] A Pseudo-document-based Topical N-grams model for short texts
    Hao Lin
    Yuan Zuo
    Guannan Liu
    Hong Li
    Junjie Wu
    Zhiang Wu
    [J]. World Wide Web, 2020, 23 : 3001 - 3023
  • [9] A Pseudo-document-based Topical N-grams model for short texts
    Lin, Hao
    Zuo, Yuan
    Liu, Guannan
    Li, Hong
    Wu, Junjie
    Wu, Zhiang
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3001 - 3023
  • [10] Document-based topic coherence measures for news media text
    Korencic, Damir
    Ristov, Strahil
    Snajder, Jan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 357 - 373