Biterm Pseudo Document Topic Model for Short Text

被引:0
|
作者
Jiang, Lan [1 ]
Lu, Hengyang [1 ]
Xu, Ming [1 ]
Wang, Chongjun [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, Natl Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICTAI.2016.131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past few years, we have witnessed a rapid development of online social media, from which we can access various short texts. Understanding the topic patterns of these short text is significant. Traditional topic models, like LDA, are not suitable when applied to short text topic analysis due to data sparsity. A lot of efforts have been made to solve this problem. However, there is still significant space to improve the effectiveness of these short text specific methods. In this paper, we proposed a novel word co-occurrence network based method, referred to as biterm pseudo document topic model (BPDTM), which extended the previous biterm topic model(BTM) for short text. We utilized the word co-occurrence network to construct biterm pseudo documents. The proposed model is promising since it represents words with their semantic adjacent biterms and is able to model the corpus-level semantic relation between two words. Besides, BPDTM naturally lengthens the documents, which alleviate the influence for performance exerted by data sparsity. Experiments demonstrated that our model outperformed two baselines, i.e. LDA and BTM, which proved its effectiveness on short text topic modeling task.
引用
收藏
页码:865 / 872
页数:8
相关论文
共 50 条
  • [1] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    [J]. COMPUTER JOURNAL, 2019, 62 (03): : 359 - 372
  • [2] Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Wu, Yunkun
    Jiang, Kailong
    Guo, Wenli
    Shi, Wandong
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3969 - 3975
  • [3] Sparse Biterm Topic Model for Short Texts
    Zhu, Bingshan
    Cai, Yi
    Zhang, Huakui
    [J]. WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 227 - 241
  • [4] Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection
    Hu, Xuegang
    Wang, Haiyan
    Li, Peipei
    [J]. PATTERN RECOGNITION LETTERS, 2018, 116 : 187 - 194
  • [5] A Biterm-based Dirichlet Process Topic Model for Short Texts
    Pan, Yali
    Yin, Jian
    Liu, Shaopeng
    Li, Jing
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 301 - 304
  • [6] PSLDA:a novel supervised pseudo document-based topic model for short texts
    Mingtao SUN
    Xiaowei ZHAO
    Jingjing LIN
    Jian JING
    Deqing WANG
    Guozhu JIA
    [J]. Frontiers of Computer Science., 2022, 16 (06) - 81
  • [7] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Mingtao Sun
    Xiaowei Zhao
    Jingjing Lin
    Jian Jing
    Deqing Wang
    Guozhu Jia
    [J]. Frontiers of Computer Science, 2022, 16
  • [8] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Sun, Mingtao
    Zhao, Xiaowei
    Lin, Jingjing
    Jing, Jian
    Wang, Deqing
    Jia, Guozhu
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [9] A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text
    Nimala K
    Jebakumar R
    [J]. Journal of Medical Systems, 2019, 43 (4)
  • [10] A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text
    Nimala, K.
    Jebakumar, R.
    [J]. JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (04)