A Hybrid Approach for Sparse Data Classification Based on Topic Model

被引:1
|
作者
Wang, Guangjing [1 ]
Zhang, Jie [1 ]
Yang, Xiaobin [1 ]
Li, Li [1 ]
机构
[1] Southwest Univ, Fac Comp & Informat Sci, Chongqing 400715, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-47121-1_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With an increasing number of short text emerging, sparse text classification is becoming crucial in data mining and information retrieval area. Many efforts have been devoted to improve the efficiency of normal text classification. However, it is still immature in terms of high-dimension and sparse data processing. In this paper, we present a new method which fancifully utilizes Biterm TopicModel (BTM) and Support Vector Machine (SVM). By using BTM, though the dimensionality of training data is reduced significantly, it is still able to keep rich semantic information for the sparse data. We then employ SVM on the generated topics or features. Experiments on 20 Newsgroups and Tencent microblog dataset demonstrate that our approach can achieve excellent classifier performance in terms of precision, recall and F1 measure. Furthermore, it is proved that the proposed method has high efficiency compared with the combination of Latent Dirichlet Allocation (LDA) and SVM. Our method enhances the previous work in this field and establishes the foundation for further studies.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [21] A Hybrid Classification Approach using Topic Modeling and Graph Convolution Networks
    Singh, Thoudam Doren
    Divyansha
    Singh, Apoorva Vikram
    Khilji, Abdullah Faiz Ur Rahman
    2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 285 - 289
  • [22] !Sentiment Classification: A Topic Sequence-Based Approach
    Song, Xuliang
    Liang, Jiguang
    Hu, Chengcheng
    JOURNAL OF COMPUTERS, 2016, 11 (01) : 1 - 9
  • [23] A hybrid data perturbation and mean clustering approach based privacy preserving classification model for large databases
    Gouse, Sk. Mohammed
    Mohan, G. Krishna
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022,
  • [24] DECT sparse reconstruction based on hybrid spectrum data generative diffusion model
    Liu, Jin
    Wu, Fan
    Zhan, Guorui
    Wang, Kun
    Zhang, Yikun
    Hu, Dianlin
    Chen, Yang
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 261
  • [25] A hybrid ASM approach for sparse volumetric data segmentation
    Zhu Y.
    Williams S.
    Zwiggelaar R.
    Pattern Recognition and Image Analysis, 2007, 17 (02) : 252 - 258
  • [26] A new big data approach for topic classification and sentiment analysis of Twitter data
    Anisha P. Rodrigues
    Niranjan N. Chiplunkar
    Evolutionary Intelligence, 2022, 15 : 877 - 887
  • [27] A new big data approach for topic classification and sentiment analysis of Twitter data
    Rodrigues, Anisha P.
    Chiplunkar, Niranjan N.
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) : 877 - 887
  • [28] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [29] Topic document model approach for naive Bayes text classification
    Kim, SB
    Rim, HC
    Kim, JD
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
  • [30] An Embedding-Based Topic Model for Document Classification
    Seifollahi, Sattar
    Piccardi, Massimo
    Jolfaei, Alireza
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)