Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings

被引:24
|
作者
Li, Ximing [1 ,2 ]
Zhang, Ang [1 ,2 ]
Li, Changchun [1 ,2 ]
Guo, Lantian [3 ]
Wang, Wenting [2 ,4 ]
Ouyang, Jihong [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun, Jilin, Peoples R China
[2] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun, Jilin, Peoples R China
[3] Northwestern Polytech Univ, Sch Automat, Xian, Shaanxi, Peoples R China
[4] Jilin Univ, Coll Math, Changchun, Jilin, Peoples R China
来源
COMPUTER JOURNAL | 2019年 / 62卷 / 03期
基金
中国国家自然科学基金;
关键词
short text; topic modeling; word embeddings; clustering; text similarity;
D O I
10.1093/comjnl/bxy037
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Short texts, such as Twitter social media posts, have become increasingly popular on the Internet. Inferring topics from massive numbers of short texts is important to many real-world applications. A single short text often contains a few words, making traditional topic models less effective. A recently developed biterm topic model (BTM) effectively models short texts by capturing the rich global word co-occurrence information. However, in the sparse short-text context, many highly related words may never co-occur. BTM may lose many potential coherent and prominent word co-occurrence patterns that cannot be observed in the corpus. To address this problem, we propose a novel relational BTM (R-BTM) model, which links short texts using a similarity list of words computed employing word embeddings. To evaluate the effectiveness of R-BTM, we compare it against the existing short-text topic models on a variety of traditional tasks, including topic quality, clustering and text similarity. Experimental results on real-world datasets indicate that R-BTM outperforms baseline topic models for short texts.
引用
收藏
页码:359 / 372
页数:14
相关论文
共 50 条
  • [1] Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Wu, Yunkun
    Jiang, Kailong
    Guo, Wenli
    Shi, Wandong
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3969 - 3975
  • [2] Improving biterm topic model with word embeddings
    Huang, Jiajia
    Peng, Min
    Li, Pengwei
    Hu, Zhiwei
    Xu, Chao
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3099 - 3124
  • [3] Improving biterm topic model with word embeddings
    Jiajia Huang
    Min Peng
    Pengwei Li
    Zhiwei Hu
    Chao Xu
    [J]. World Wide Web, 2020, 23 : 3099 - 3124
  • [4] Incorporating word embeddings into topic modeling of short text
    Gao, Wang
    Peng, Min
    Wang, Hua
    Zhang, Yanchun
    Xie, Qianqian
    Tian, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 1123 - 1145
  • [5] Incorporating word embeddings into topic modeling of short text
    Wang Gao
    Min Peng
    Hua Wang
    Yanchun Zhang
    Qianqian Xie
    Gang Tian
    [J]. Knowledge and Information Systems, 2019, 61 : 1123 - 1145
  • [6] SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings
    Lu, Heng-Yang
    Zhang, Yi
    Du, Yuntao
    [J]. DATA TECHNOLOGIES AND APPLICATIONS, 2021, 55 (05) : 643 - 660
  • [7] Short Text Topic Model with Word Embeddings and Context Information
    Zhang, Xianchao
    Feng, Ran
    Liang, Wenxin
    [J]. RECENT ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY 2018, 2019, 769 : 55 - 64
  • [8] Topic Modeling on Podcast Short-Text Metadata
    Valero, Francisco B.
    Baranes, Marion
    Epure, Elena, V
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 472 - 486
  • [9] Biterm Pseudo Document Topic Model for Short Text
    Jiang, Lan
    Lu, Hengyang
    Xu, Ming
    Wang, Chongjun
    [J]. 2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 865 - 872
  • [10] Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
    Albalawi, Rania
    Yeap, Tet Hin
    Benyoucef, Morad
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3