A Hybrid Approach for Sparse Data Classification Based on Topic Model

被引:1
|
作者
Wang, Guangjing [1 ]
Zhang, Jie [1 ]
Yang, Xiaobin [1 ]
Li, Li [1 ]
机构
[1] Southwest Univ, Fac Comp & Informat Sci, Chongqing 400715, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-47121-1_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With an increasing number of short text emerging, sparse text classification is becoming crucial in data mining and information retrieval area. Many efforts have been devoted to improve the efficiency of normal text classification. However, it is still immature in terms of high-dimension and sparse data processing. In this paper, we present a new method which fancifully utilizes Biterm TopicModel (BTM) and Support Vector Machine (SVM). By using BTM, though the dimensionality of training data is reduced significantly, it is still able to keep rich semantic information for the sparse data. We then employ SVM on the generated topics or features. Experiments on 20 Newsgroups and Tencent microblog dataset demonstrate that our approach can achieve excellent classifier performance in terms of precision, recall and F1 measure. Furthermore, it is proved that the proposed method has high efficiency compared with the combination of Latent Dirichlet Allocation (LDA) and SVM. Our method enhances the previous work in this field and establishes the foundation for further studies.
引用
收藏
页码:17 / 28
页数:12
相关论文
共 50 条
  • [41] A hybrid approach for classification of rare class data
    Wankhade, Kapil Keshao
    Jondhale, Kalpana C.
    Thool, Vijaya R.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (01) : 197 - 221
  • [42] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [43] A Hybrid Approach for Breast Tissue Data Classification
    Prasad, Dilip Kumar
    Quek, Chai
    Leung, Maylor K. H.
    TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2009, : 1151 - 1154
  • [44] A hybrid approach for classification of rare class data
    Kapil Keshao Wankhade
    Kalpana C. Jondhale
    Vijaya R. Thool
    Knowledge and Information Systems, 2018, 56 : 197 - 221
  • [45] A Hybrid Approach for Cases Classification of Medical Data
    Chen, Xiaoyu
    Liu, Bo
    Xia, Xin
    FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE IV, PTS 1-5, 2014, 496-500 : 1965 - 1970
  • [46] Pattern Based Topic Model for Data Mining
    Jadhav, B. S.
    Bhosale, D. S.
    Jadhav, D. S.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 2, 2016, : 382 - 387
  • [47] A hybrid approach of topic model and matrix factorization based on two-step recommendation framework
    Zhao, Xiangyu
    Niu, Zhendong
    Chen, Wei
    Shi, Chongyang
    Niu, Ke
    Liu, Donglei
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 44 (03) : 335 - 353
  • [48] A hybrid approach of topic model and matrix factorization based on two-step recommendation framework
    Xiangyu Zhao
    Zhendong Niu
    Wei Chen
    Chongyang Shi
    Ke Niu
    Donglei Liu
    Journal of Intelligent Information Systems, 2015, 44 : 335 - 353
  • [49] Topic-based habitat classification using visual data
    Pizarro, Oscar
    Williams, Stefan B.
    Colquhoun, Jamie
    OCEANS 2009 - EUROPE, VOLS 1 AND 2, 2009, : 1320 - +
  • [50] DEEP LEARNING CROP CLASSIFICATION APPROACH BASED ON SPARSE CODING OF TIME SERIES OF SATELLITE DATA
    Lavreniuk, Mykola
    Kussul, Nataliia
    Novikov, Alexei
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 4812 - 4815