Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings

被引:0
|
作者
Yang, Yi [1 ,3 ]
Wang, Hongan [1 ,2 ,3 ]
Zhu, Jiaqi [1 ,2 ,3 ]
Wu, Yunkun [3 ]
Jiang, Kailong [3 ]
Guo, Wenli [3 ]
Shi, Wandong [3 ]
机构
[1] Chinese Acad Sci, Inst Software, SKLCS, Beijing, Peoples R China
[2] Zhejiang Lab, Hangzhou, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dataless text classification has attracted increasing attentions recently. It only needs very few seed words of each category to classify documents, which is much cheaper than supervised text classification that requires massive labeling efforts. However, most of existing models pay attention to long texts, but get unsatisfactory performance on short texts, which have become increasingly popular on the Internet. In this paper, we at first propose a novel model named Seeded Biterm Topic Model (SeedBTM) extending BTM to solve the problem of dataless short text classification with seed words. It takes advantage of both word co-occurrence information in the topic model and category-word similarity from widely used word embeddings as the prior topic-in-set knowledge. Moreover, with the same approach, we also propose Seeded Twitter Biterm Topic Model (SeedTBTM), which extends Twitter-BTM and utilizes additional user information to achieve higher classification accuracy. Experimental results on five real short-text datasets show that our models outperform the state-of-the-art methods, and especially perform well when the categories are overlapping and interrelated.
引用
收藏
页码:3969 / 3975
页数:7
相关论文
共 50 条
  • [31] Topic Discovery for Short Texts Using Word Embeddings
    Xun, Guangxu
    Gopalakrishnan, Vishrawas
    Ma, Fenglong
    Li, Yaliang
    Gao, Jing
    Zhang, Aidong
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1299 - 1304
  • [32] Topic Modeling for Short Texts with Auxiliary Word Embeddings
    Li, Chenliang
    Wang, Haoran
    Zhang, Zhiqian
    Sun, Aixin
    Ma, Zongyang
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 165 - 174
  • [33] A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text
    Nimala, K.
    Jebakumar, R.
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (04)
  • [34] A Correlated Topic Model Using Word Embeddings
    Xun, Guangxu
    Li, Yaliang
    Zhao, Wayne Xin
    Gao, Jing
    Zhang, Aidong
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4207 - 4213
  • [35] News Text Classification Model Based on Topic Model
    Li, Zhenzhong
    Shang, Wenqian
    Yan, Menghan
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1197 - 1201
  • [36] Word co-occurrence augmented topic model in short text
    Chen, Guan-Bin
    Kao, Hung-Yu
    INTELLIGENT DATA ANALYSIS, 2017, 21 : S55 - S70
  • [37] A Dirichlet process biterm-based mixture model for short text stream clustering
    Chen, Junyang
    Gong, Zhiguo
    Liu, Weiwen
    APPLIED INTELLIGENCE, 2020, 50 (05) : 1609 - 1619
  • [38] Short text classification using semantically enriched topic model
    Uddin, Farid
    Chen, Yibo
    Zhang, Zuping
    Huang, Xin
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [39] Learning from Few Samples: Lexical Substitution with Word Embeddings for Short Text Classification
    Elekes, Abel
    Di Stefano, Antonino Simone
    Schaeler, Martin
    Boehm, Klemens
    Keller, Matthias
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 111 - 119
  • [40] A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings
    Nie, Tian
    Ding, Yi
    Zhao, Chen
    Lin, Youchao
    Utsuro, Takehito
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2018, 6 (03) : 67 - 78