A Hybrid Approach for Sparse Data Classification Based on Topic Model

被引:1
|
作者
Wang, Guangjing [1 ]
Zhang, Jie [1 ]
Yang, Xiaobin [1 ]
Li, Li [1 ]
机构
[1] Southwest Univ, Fac Comp & Informat Sci, Chongqing 400715, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-47121-1_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With an increasing number of short text emerging, sparse text classification is becoming crucial in data mining and information retrieval area. Many efforts have been devoted to improve the efficiency of normal text classification. However, it is still immature in terms of high-dimension and sparse data processing. In this paper, we present a new method which fancifully utilizes Biterm TopicModel (BTM) and Support Vector Machine (SVM). By using BTM, though the dimensionality of training data is reduced significantly, it is still able to keep rich semantic information for the sparse data. We then employ SVM on the generated topics or features. Experiments on 20 Newsgroups and Tencent microblog dataset demonstrate that our approach can achieve excellent classifier performance in terms of precision, recall and F1 measure. Furthermore, it is proved that the proposed method has high efficiency compared with the combination of Latent Dirichlet Allocation (LDA) and SVM. Our method enhances the previous work in this field and establishes the foundation for further studies.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [1] SPARSE TOPIC MODEL FOR TEXT CLASSIFICATION
    Liu, Tao
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1916 - 1920
  • [2] Scene Classification Based on the Fully Sparse Semantic Topic Model
    Zhu, Qiqi
    Zhong, Yanfei
    Zhang, Liangpei
    Li, Deren
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (10): : 5525 - 5538
  • [3] Unsupervised Sentiment Classification: A Hybrid Sentiment-Topic Model Approach
    Blair, Stuart J.
    Bi, Yaxin
    Mulvenna, Maurice D.
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 453 - 460
  • [4] Query Classification using LDA Topic Model and Sparse Representation Based Classifier
    Bhattacharya, Indrani
    Sil, Jaya
    PROCEEDINGS OF THE THIRD ACM IKDD CONFERENCE ON DATA SCIENCES (CODS), 2016,
  • [5] Scene Classification Based on the Sparse Homogeneous-Heterogeneous Topic Feature Model
    Zhu, Qiqi
    Zhong, Yanfei
    Wu, Siqi
    Zhang, Liangpei
    Li, Deren
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (05): : 2689 - 2703
  • [6] A discriminative and sparse topic model for image classification and annotation
    Yang, Liu
    Jing, Liping
    Ng, Michael K.
    Yu, Jian
    IMAGE AND VISION COMPUTING, 2016, 51 : 22 - 35
  • [7] A Biterm Topic Model for Sparse Mutation Data
    Sason, Itay
    Chen, Yuexi
    Leiserson, Mark D. M.
    Sharan, Roded
    CANCERS, 2023, 15 (05)
  • [8] A hybrid approach for image classification based on sparse coding and wavelet decomposition
    Ben Said, Amel
    Jemel, Intidhar
    Ejbali, Ridha
    Zaied, Mourad
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 63 - 68
  • [9] FMRI DATA CLASSIFICATION BASED ON HYBRID TEMPORAL AND SPATIAL SPARSE REPRESENTATION
    Liu, Huan
    Zhang, Mianzhi
    Hu, Xintao
    Ren, Yudan
    Zhang, Shu
    Han, Junwei
    Guo, Lei
    Liu, Tianming
    2017 IEEE 14TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2017), 2017, : 957 - 960
  • [10] A Hybrid Latent Dirichlet Allocation Approach for Topic Classification
    Hsu, Chi-I
    Chiu, Chaochang
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 312 - 315