Short text classification based on strong feature thesaurus

被引:31
|
作者
Wang, Bing-kun [1 ,2 ]
Huang, Yong-feng [1 ,2 ]
Yang, Wan-xia [1 ,2 ]
Li, Xing [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Elect & Engn, Informat Cognit & Intelligent Syst Res Inst, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Informat Technol Natl Lab, Beijing 100084, Peoples R China
关键词
Short text; Classification; Data sparseness; Semantic; Strong feature thesaurus (SFT); Latent Dirichlet allocation (LDA);
D O I
10.1631/jzus.C1100373
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low accuracy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Na < ve Bayes Multinomial.
引用
收藏
页码:649 / 659
页数:11
相关论文
共 50 条
  • [1] Short text classification based on strong feature thesaurus
    Bing-kun Wang
    Yong-feng Huang
    Wan-xia Yang
    Xing Li
    [J]. Journal of Zhejiang University SCIENCE C, 2012, 13 : 649 - 659
  • [3] Short text model based on Strong feature thesaurus
    Lu, Wentao
    Huang, Yongfeng
    Li, Xing
    Zhang, Zhuo
    Li, Yingkun
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 620 - 625
  • [4] Method of Feature Reduction in Short Text Classification Based on Feature Clustering
    Li, Fangfang
    Yin, Yao
    Shi, Jinjing
    Mao, Xingliang
    Shi, Ronghua
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [5] Feature selection based on long short term memory for text classification
    Ming Hong
    Heyong Wang
    [J]. Multimedia Tools and Applications, 2024, 83 : 44333 - 44378
  • [6] Short Text Sentiment Classification Based on Feature extension and ensemble classifier
    Liu, Yang
    Zhu, Xie
    [J]. 6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [7] Feature selection based on long short term memory for text classification
    Hong, Ming
    Wang, Heyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44333 - 44378
  • [8] Short text classification based on feature extension using information in images
    Zhao S.
    Jiang Q.
    [J]. International Journal of Performability Engineering, 2019, 15 (02) : 667 - 675
  • [9] Short-text feature expansion and classification based on nonnegative matrix factorization
    Zhang, Ling
    Jiang, Wenchao
    Zhao, Zhiming
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 10066 - 10080
  • [10] Leveraging Term Co-occurrence Distance and Strong Classification Features for Short Text Feature Selection
    Ma, Huifang
    Xing, Yuying
    Wang, Shuang
    Li, Miao
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2017): 10TH INTERNATIONAL CONFERENCE, KSEM 2017, MELBOURNE, VIC, AUSTRALIA, AUGUST 19-20, 2017, PROCEEDINGS, 2017, 10412 : 67 - 75