PSLDA: a novel supervised pseudo document-based topic model for short texts

被引:4
|
作者
Sun, Mingtao [1 ]
Zhao, Xiaowei [2 ]
Lin, Jingjing [3 ]
Jing, Jian [2 ]
Wang, Deqing [2 ]
Jia, Guozhu [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Comp Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
关键词
supervised topic model; short text; pseudo-document;
D O I
10.1007/s11704-021-0606-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various kinds of online social media applications such as Twitter and Weibo, have brought a huge volume of short texts. However, mining semantic topics from short texts efficiently is still a challenging problem because of the sparseness of word-occurrence and the diversity of topics. To address the above problems, we propose a novel supervised pseudo-document-based maximum entropy discrimination latent Dirichlet allocation model (PSLDA for short). Specifically, we first assume that short texts are generated from the normal size latent pseudo documents, and the topic distributions are sampled from the pseudo documents. In this way, the model will reduce the sparseness of word-occurrence and the diversity of topics because it implicitly aggregates short texts to longer and higher-level pseudo documents. To make full use of labeled information in training data, we introduce labels into the model, and further propose a supervised topic model to learn the reasonable distribution of topics. Extensive experiments demonstrate that our proposed method achieves better performance compared with some state-of-the-art methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Neural labeled LDA: a topic model for semi-supervised document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    Soft Computing, 2021, 25 : 14561 - 14571
  • [42] Document Similarity Measure Based on Topic Model
    He, Ming
    Wang, Zhen-zhen
    Du, Yong-ping
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1280 - 1284
  • [43] SLDA-TC: A Novel Text Categorization Approach Based on Supervised Topic Model
    Tang H.-L.
    Dou Q.-S.
    Yu L.-P.
    Song Y.-J.
    Lu M.-Y.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (06): : 1300 - 1308
  • [44] A topic model for co-occurring normal documents and short texts
    Yang, Yang
    Wang, Feifei
    Zhang, Junni
    Xu, Jin
    Yu, Philip S.
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (02): : 487 - 513
  • [45] A topic model for co-occurring normal documents and short texts
    Yang Yang
    Feifei Wang
    Junni Zhang
    Jin Xu
    Philip S. Yu
    World Wide Web, 2018, 21 : 487 - 513
  • [46] A neural topic model with word vectors and entity vectors for short texts
    Zhao, Xiaowei
    Wang, Deqing
    Zhao, Zhengyang
    Liu, Wei
    Lu, Chenwei
    Zhuang, Fuzhen
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
  • [47] A Document-based Data Model for Large Scale Computational Maritime Situational Awareness
    Cazzanti, Luca
    Millefiori, Leonardo M.
    Arcieri, Gianfranco
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1350 - 1356
  • [48] TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement
    Mai, Chengcheng
    Qiu, Xueming
    Luo, Kaiwen
    Chen, Min
    Zhao, Bo
    Huang, Yihua
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 640 - 651
  • [49] Improved Compare-Aggregate Model for Chinese Document-Based Question Answering
    Wang, Ziliang
    Bian, Weijie
    Li, Si
    Chen, Guang
    Lin, Zhiqing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 712 - 720
  • [50] A novel contextual topic model for multi-document summarization
    Yang, Guangbing
    Wen, Dunwei
    Kinshuk
    Chen, Nian-Shing
    Sutinen, Erkki
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1340 - 1352