Short text classification based on strong feature thesaurus

被引:0
|
作者
Bingkun WANG Yongfeng HUANG Wanxia YANG Xing LI Information Cognitive and Intelligent System Research Institute Department of Electronic and Engineering Tsinghua University Beijing China Information Technology National Laboratory Tsinghua University Beijing China [1 ,2 ,1 ,2 ,1 ,2 ,1 ,2 ,1 ,100084 ,2 ,100084 ]
机构
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low accuracy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Nave Bayes Multinomial.
引用
收藏
页码:649 / 659
页数:11
相关论文
共 50 条
  • [1] Short text classification based on strong feature thesaurus
    Bing-kun Wang
    Yong-feng Huang
    Wan-xia Yang
    Xing Li
    Journal of Zhejiang University SCIENCE C, 2012, 13 : 649 - 659
  • [2] Short text classification based on strong feature thesaurus
    Wang, Bing-kun
    Huang, Yong-feng
    Yang, Wan-xia
    Li, Xing
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2012, 13 (09): : 649 - 659
  • [4] Short text model based on Strong feature thesaurus
    Lu, Wentao
    Huang, Yongfeng
    Li, Xing
    Zhang, Zhuo
    Li, Yingkun
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 620 - 625
  • [5] Method of Feature Reduction in Short Text Classification Based on Feature Clustering
    Li, Fangfang
    Yin, Yao
    Shi, Jinjing
    Mao, Xingliang
    Shi, Ronghua
    APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [6] Feature selection based on long short term memory for text classification
    Ming Hong
    Heyong Wang
    Multimedia Tools and Applications, 2024, 83 : 44333 - 44378
  • [7] Short Text Sentiment Classification Based on Feature extension and ensemble classifier
    Liu, Yang
    Zhu, Xie
    6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [8] Feature selection based on long short term memory for text classification
    Hong, Ming
    Wang, Heyong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44333 - 44378
  • [9] Short text classification based on feature extension using information in images
    Zhao S.
    Jiang Q.
    International Journal of Performability Engineering, 2019, 15 (02) : 667 - 675
  • [10] Short-text feature expansion and classification based on nonnegative matrix factorization
    Zhang, Ling
    Jiang, Wenchao
    Zhao, Zhiming
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 10066 - 10080