Short text classification based on strong feature thesaurus

被引:0
|
作者
Bingkun WANG Yongfeng HUANG Wanxia YANG Xing LI Information Cognitive and Intelligent System Research Institute Department of Electronic and Engineering Tsinghua University Beijing China Information Technology National Laboratory Tsinghua University Beijing China [1 ,2 ,1 ,2 ,1 ,2 ,1 ,2 ,1 ,100084 ,2 ,100084 ]
机构
关键词
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low accuracy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Nave Bayes Multinomial.
引用
收藏
页码:649 / 659
页数:11
相关论文
共 50 条
  • [21] Text sentiment classification based on feature fusion
    Zhang C.
    Li Q.
    Cheng X.
    Revue d'Intelligence Artificielle, 2020, 34 (04) : 515 - 520
  • [22] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [23] Feature Extension for Chinese Short Text Classification Based on LDA and Word2vec
    Sun, Fanke
    Chen, Heping
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 1189 - 1194
  • [24] ADCL: An attention feature enhancement network based on adversarial contrastive learning for short text classification
    Su, Shun
    Shao, Dangguo
    Ma, Lei
    Yi, Sanli
    Yang, Ziwei
    ADVANCED ENGINEERING INFORMATICS, 2025, 65
  • [25] Improving Persian Text Classification Using Persian Thesaurus
    Parvin, Hamid
    Minaei-Bidgoli, Behrouz
    Dahbashi, Atousa
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 391 - 398
  • [26] Improving Short Text Classification through Better Feature Space Selection
    Wang, Meng
    Lin, Lanfen
    Wang, Feng
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 120 - 124
  • [27] Text Relatedness Based on a Word Thesaurus
    Tsatsaronis, George
    Varlamis, Iraklis
    Vazirgiannis, Michalis
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2010, 37 : 1 - 39
  • [28] Short Text Classification Based on Keywords Extension
    Gu, Yiran
    Shen, Jiajia
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2616 - 2621
  • [29] Wikipedia Based Short Text Classification Method
    Li, Junze
    Cai, Yi
    Cai, Zhiwei
    Leung, Hofung
    Yang, Kai
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), 2017, 10179 : 275 - 286
  • [30] A THESAURUS-GUIDED TEXT ANALYTICS TECHNIQUE FOR CAPABILITY BASED CLASSIFICATION OF MANUFACTURING SUPPLIERS
    Sabbagh, Ramin
    Ameri, Farhad
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2017, VOL 1, 2017,