Feature Extension for Chinese Short Text Classification Based on Topical N-Grams

被引:0
|
作者
Sun, Baoshan [1 ]
Zhao, Peng [1 ]
机构
[1] Tianjin Polytech Univ, Sch Comp Sci & Software Engn, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Topical N-Grams; LDA; Short Texts Classification; Feature Extension; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the feature sparseness problem, conventional text classification methods hardly achieve a good effect on short texts. This paper presents a novel feature extension method based on the TNG model to solve this problem. This algorithm can infers not only the unigram words distribution but also the phrases distribution on each topic. We can build a feature extension library using TNG algorithm. Base on the original features in short texts, we can compute the topic tendency for each of these texts. According to the topic tendency, the appropriate candidate words and phrases are selected from the feature extension library. And then these candidate words and phrases are put into original short texts. After extending features, we use the LDA and SVM algorithm to classify these expanded short texts and use precision, recall and F1-score to evaluate the effect of classification. The result shows that our method can significantly improve classification performance.
引用
收藏
页码:477 / 482
页数:6
相关论文
共 50 条
  • [41] WEFEST: Word Embedding Feature Extension for Short Text Classification
    Sang, Lei
    Xie, Fei
    Liu, Xiaojian
    Wu, Xindong
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 677 - 683
  • [42] Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    SPRINGERPLUS, 2016, 5
  • [43] Interpolated N-Grams for Model Based Testing
    Tonella, Paolo
    Tiella, Roberto
    Cu Duy Nguyen
    36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 562 - 572
  • [44] Open N-grams and discriminant features in text world:: An empirical study
    Lapedriza, A
    Vitrià, J
    RECENT ADVANCES IN ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2004, 113 : 53 - 60
  • [45] Human Action Classification Using N-Grams Visual Vocabulary
    Hernandez-Garcia, Ruber
    Garcia-Reyes, Edel
    Ramos-Cozar, Julian
    Guil, Nicolas
    PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 319 - 326
  • [46] Algorithm for Updating n-Grams Word Dictionary for Web Classification
    Abidin, Taufik Fuadi
    Ferdhiana, Ridha
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 432 - 436
  • [47] Hierarchical Classification of Chinese Documents Based on N grams
    Zhou Shui geng 1
    Wuhan University Journal of Natural Sciences, 2001, (Z1) : 416 - 422
  • [48] Learning Chinese Word Embeddings With Words and Subcharacter N-Grams
    Kang, Ruizhi
    Zhang, Hongjun
    Hao, Wenning
    Cheng, Kai
    Zhang, Guanglu
    IEEE ACCESS, 2019, 7 : 42987 - 42992
  • [49] Short Text Classification Based on Keywords Extension
    Gu, Yiran
    Shen, Jiajia
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 2616 - 2621
  • [50] Feature Extension for short text
    Yan Tao
    Wang Xi-wei
    THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 338 - 341