A Short Text Classification Method Based on N-Gram and CNN

被引:33
|
作者
Wang, Haitao [1 ]
He, Jie [1 ]
Zhang, Xiaohong [1 ]
Liu, Shufen [2 ]
机构
[1] Henan Polytech Univ, Coll Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text; Classification; Convolution neural network; N-gram; Concentration mechanism;
D O I
10.1049/cje.2020.01.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.
引用
收藏
页码:248 / 254
页数:7
相关论文
共 50 条
  • [31] Character-Based N-gram Model for Uyghur Text Retrieval
    Tohti, Turdi
    Xu, Lirui
    Huang, Jimmy
    Musajan, Winira
    Hamdulla, Askar
    [J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 678 - 688
  • [32] Turkish Meaningful Text Generation with Class Based N-Gram Model
    Kutlugun, Mehmet Ali
    Sirin, Yahya
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [33] N-Gram Based Paraphrase Generator from Large text Document
    Gadag, Ashwini I.
    Sagar, B. M.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 91 - 94
  • [34] CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text
    Xu, Wenhua
    Huang, Hao
    Zhang, Jie
    Gu, Hao
    Yang, Jie
    Gui, Guan
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (12): : 6080 - 6096
  • [35] A machine learning approach for Arabic text classification using N-gram frequency statistics
    Khreisat, Laila
    [J]. JOURNAL OF INFORMETRICS, 2009, 3 (01) : 72 - 77
  • [36] Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary
    Yukino Ikegami
    Setsuo Tsuruta
    [J]. Multimedia Tools and Applications, 2015, 74 : 3933 - 3946
  • [37] Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary
    Ikegami, Yukino
    Tsuruta, Setsuo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (11) : 3933 - 3946
  • [38] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
  • [39] Chinese Text Categorization Using the Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
  • [40] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +