Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings

被引:0
|
作者
Yang, Yi [1 ,3 ]
Wang, Hongan [1 ,2 ,3 ]
Zhu, Jiaqi [1 ,2 ,3 ]
Wu, Yunkun [3 ]
Jiang, Kailong [3 ]
Guo, Wenli [3 ]
Shi, Wandong [3 ]
机构
[1] Chinese Acad Sci, Inst Software, SKLCS, Beijing, Peoples R China
[2] Zhejiang Lab, Hangzhou, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dataless text classification has attracted increasing attentions recently. It only needs very few seed words of each category to classify documents, which is much cheaper than supervised text classification that requires massive labeling efforts. However, most of existing models pay attention to long texts, but get unsatisfactory performance on short texts, which have become increasingly popular on the Internet. In this paper, we at first propose a novel model named Seeded Biterm Topic Model (SeedBTM) extending BTM to solve the problem of dataless short text classification with seed words. It takes advantage of both word co-occurrence information in the topic model and category-word similarity from widely used word embeddings as the prior topic-in-set knowledge. Moreover, with the same approach, we also propose Seeded Twitter Biterm Topic Model (SeedTBTM), which extends Twitter-BTM and utilizes additional user information to achieve higher classification accuracy. Experimental results on five real short-text datasets show that our models outperform the state-of-the-art methods, and especially perform well when the categories are overlapping and interrelated.
引用
下载
收藏
页码:3969 / 3975
页数:7
相关论文
共 50 条
  • [1] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    COMPUTER JOURNAL, 2019, 62 (03): : 359 - 372
  • [2] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    Computer Journal, 2019, 62 (03): : 359 - 372
  • [3] Improving biterm topic model with word embeddings
    Jiajia Huang
    Min Peng
    Pengwei Li
    Zhiwei Hu
    Chao Xu
    World Wide Web, 2020, 23 : 3099 - 3124
  • [4] Improving biterm topic model with word embeddings
    Huang, Jiajia
    Peng, Min
    Li, Pengwei
    Hu, Zhiwei
    Xu, Chao
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3099 - 3124
  • [5] Short Text Topic Model with Word Embeddings and Context Information
    Zhang, Xianchao
    Feng, Ran
    Liang, Wenxin
    RECENT ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY 2018, 2019, 769 : 55 - 64
  • [6] Biterm Pseudo Document Topic Model for Short Text
    Jiang, Lan
    Lu, Hengyang
    Xu, Ming
    Wang, Chongjun
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 865 - 872
  • [7] Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection
    Hu, Xuegang
    Wang, Haiyan
    Li, Peipei
    PATTERN RECOGNITION LETTERS, 2018, 116 : 187 - 194
  • [8] Incorporating word embeddings into topic modeling of short text
    Gao, Wang
    Peng, Min
    Wang, Hua
    Zhang, Yanchun
    Xie, Qianqian
    Tian, Gang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 1123 - 1145
  • [9] Incorporating word embeddings into topic modeling of short text
    Wang Gao
    Min Peng
    Hua Wang
    Yanchun Zhang
    Qianqian Xie
    Gang Tian
    Knowledge and Information Systems, 2019, 61 : 1123 - 1145
  • [10] Dataless Text Classification with Pseudo Topic Representation
    Yan, Rong
    Chen, Qi
    Gao, Guanglai
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1255 - 1259