A Hybrid Approach for Sparse Data Classification Based on Topic Model

被引:1
|
作者
Wang, Guangjing [1 ]
Zhang, Jie [1 ]
Yang, Xiaobin [1 ]
Li, Li [1 ]
机构
[1] Southwest Univ, Fac Comp & Informat Sci, Chongqing 400715, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-47121-1_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With an increasing number of short text emerging, sparse text classification is becoming crucial in data mining and information retrieval area. Many efforts have been devoted to improve the efficiency of normal text classification. However, it is still immature in terms of high-dimension and sparse data processing. In this paper, we present a new method which fancifully utilizes Biterm TopicModel (BTM) and Support Vector Machine (SVM). By using BTM, though the dimensionality of training data is reduced significantly, it is still able to keep rich semantic information for the sparse data. We then employ SVM on the generated topics or features. Experiments on 20 Newsgroups and Tencent microblog dataset demonstrate that our approach can achieve excellent classifier performance in terms of precision, recall and F1 measure. Furthermore, it is proved that the proposed method has high efficiency compared with the combination of Latent Dirichlet Allocation (LDA) and SVM. Our method enhances the previous work in this field and establishes the foundation for further studies.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [31] Classification of Text Documents Based on a Probabilistic Topic Model
    Karpovich, S. N.
    Smirnov, A. V.
    Teslya, N. N.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2019, 46 (05) : 314 - 320
  • [32] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [33] SHORT TEXT CLASSIFICATION BASED ON LDA TOPIC MODEL
    Chen, Qiuxing
    Yao, Lixiu
    Yang, Jie
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 749 - 753
  • [34] Topic Model Based Multi-Label Classification
    Padmanabhan, Divya
    Bhat, Satyanath
    Shevade, Shirish
    Narahari, Y.
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 996 - 1003
  • [35] Image classification based on sparse-coded features using sparse coding technique for aerial imagery: a hybrid dictionary approach
    Qayyum, Abdul
    Malik, Aamir Saeed
    Saad, Naufal M.
    Iqbal, Mahboob
    Abdullah, Mohd Faris
    Rasheed, Waqas
    Abdullah, Tuan A. B. Rashid
    Bin Jafaar, Mohd Yaqoob
    NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08): : 3587 - 3607
  • [36] Image classification based on sparse-coded features using sparse coding technique for aerial imagery: a hybrid dictionary approach
    Abdul Qayyum
    Aamir Saeed Malik
    Naufal M. Saad
    Mahboob Iqbal
    Mohd Faris Abdullah
    Waqas Rasheed
    Tuan A. B. Rashid Abdullah
    Mohd Yaqoob Bin Jafaar
    Neural Computing and Applications, 2019, 31 : 3587 - 3607
  • [37] Query Classification Based on Regularized Correlated Topic Model
    Zhai, Haijun
    Guo, Jiafeng
    Wu, Qiong
    Cheng, Xueqi
    Sheng, Huawei
    Zhang, Jin
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2009, : 552 - +
  • [38] A Sparse Topic Model for Bursty Topic Discovery in Social Networks
    Shi, Lei
    Du, Junping
    Kou, Feifei
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (05) : 816 - 824
  • [39] Intelligent radar software defect classification approach based on the latent Dirichlet allocation topic model
    Liu, Xi
    Yin, Yongfeng
    Li, Haifeng
    Chen, Jiabin
    Liu, Chang
    Wang, Shengli
    Yin, Rui
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2021, 2021 (01)
  • [40] Intelligent radar software defect classification approach based on the latent Dirichlet allocation topic model
    Xi Liu
    Yongfeng Yin
    Haifeng Li
    Jiabin Chen
    Chang Liu
    Shengli Wang
    Rui Yin
    EURASIP Journal on Advances in Signal Processing, 2021