Short Text Classification Based on Feature Extension Using The N-Gram Model

被引:0
|
作者
Zhang, Xinwei [1 ]
Wu, Bin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China
关键词
Short Text; Classification; The N-Gram Model; Feature Extension; Naive Bayes;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the rapid development of Web2.0, more and more people like to show their life or opinions on social media websites or forums, such as Weibo, Twitter and Tianya, which produce masses of short texts. In order to manage these short texts effectively, Short Text Classification becomes an important branch of Text Classification. However, because of the short text length, the lack of signals, and the sparseness of features, it is very difficult to achieve high quality classification by using conventional methods. This paper proposes a novelty feature extending method based on the N-Gram model to solve the problem of feature sparseness. From continuous word sequences in the train set, we extract n-grams as our feature extension mode library. Then using features showing in the short texts, we can compute the appearance probability of other words that do not exist in original texts. We use the data set collected from Sina Weibo to carry out our extension method. After extending features of the original short texts, we use the Naive Bayes algorithm to train and evaluate a classifier. We use precision, recall and F1-Score to evaluate our work. The result shows that the extension method based on the N-Gram model can improve classification performance observably.
引用
收藏
页码:710 / 716
页数:7
相关论文
共 50 条
  • [1] A Short Text Classification Method Based on N-Gram and CNN
    WANG Haitao
    HE Jie
    ZHANG Xiaohong
    LIU Shufen
    [J]. Chinese Journal of Electronics, 2020, 29 (02) : 248 - 254
  • [2] A Short Text Classification Method Based on N-Gram and CNN
    Wang, Haitao
    He, Jie
    Zhang, Xiaohong
    Liu, Shufen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (02) : 248 - 254
  • [3] Automatic Chinese Text Classification Using N-Gram Model
    Yen, Show-Jane
    Lee, Yue-Shi
    Wu, Yu-Chieh
    Ying, Jia-Ching
    Tseng, Vincent S.
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2010, PT 3, PROCEEDINGS, 2010, 6018 : 458 - +
  • [4] Short Text Clustering using Numerical data based on N-gram
    Kumar, Rajiv
    Mathur, Robin Prakash
    [J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 274 - 276
  • [5] Classification of Text Documents based on Naive Bayes using N-Gram Features
    Baygin, Mehmet
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [6] Are n-gram Categories Helpful in Text Classification?
    Kruczek, Jakub
    Kruczek, Paulina
    Kuta, Marcin
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 524 - 537
  • [7] A Neural N-Gram Network for Text Classification
    Yan, Zhenguo
    Wu, Yue
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2018, 22 (03) : 380 - 386
  • [8] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [9] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    [J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
  • [10] Short text classification based on feature extension using information in images
    Zhao S.
    Jiang Q.
    [J]. International Journal of Performability Engineering, 2019, 15 (02) : 667 - 675