A Short Text Classification Method Based on N-Gram and CNN

被引:0
|
作者
WANG Haitao [1 ]
HE Jie [1 ]
ZHANG Xiaohong [1 ]
LIU Shufen [2 ]
机构
[1] College of Computer Science and Technology, Henan Polytechnic University
[2] College of Computer Science and Technology, Jilin University
基金
中国国家自然科学基金;
关键词
Short text; Classification; Convolution neural network; N-gram; Concentration mechanism;
D O I
暂无
中图分类号
TP391.1 [文字信息处理]; TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 081203 ; 0835 ; 1405 ;
摘要
Text classification is a fundamental task in Nature language process(NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network(CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.
引用
收藏
页码:248 / 254
页数:7
相关论文
共 50 条
  • [41] Multilingual Text Categorization Using Character N-gram
    Suzuki, Makoto
    Yamagishi, Naohide
    Tsai, Yi-Ching
    Hirasawa, Shigeichi
    [J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +
  • [42] Improved Text Generation Using N-gram Statistics
    de Novais, Eder Miranda
    Tadeu, Thiago Dias
    Paraboni, Ivandre
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 316 - 325
  • [43] Speech Corpus Generation Based on N-gram Confidence Measure Classification
    Koctur, Tomas
    Ondas, Stanislav
    Juhar, Jozef
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL SYMPOSIUM ELMAR, 2017, : 149 - 152
  • [44] Classification of ransomware families with machine learning based on N-gram of opcodes
    Zhang, Hanqi
    Xiao, Xi
    Mercaldo, Francesco
    Ni, Shiguang
    Martinelli, Fabio
    Sangaiah, Arun Kumar
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 211 - 221
  • [45] Web Page Classification using n-gram based URL Features
    Rajalakshmi, R.
    Aravindan, Chandrabose
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
  • [46] An n-gram based approach to the automatic classification of schoolchildren's writing
    Cicres, Jordi
    Queralt, Sheila
    [J]. VIAL-VIGO INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2019, 16 : 53 - 80
  • [47] Japanese text classification using N-gram and the maximum ratio of term frequency among categories
    Suzuki, Makoto
    [J]. PROCEDINGS OF THE 11TH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2007, : 197 - 202
  • [48] An Efficient CNN-based Classification on G-protein Coupled Receptors Using TF-IDF and N-gram
    Li, Man
    Ling, Cheng
    Gao, Jingyang
    [J]. 2017 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2017, : 924 - 931
  • [49] N-gram and local context analysis for Persian text retrieval
    Aleahmad, Abolfazl
    Hakimian, Parsia
    Mahdikhani, Farzad
    Oroumchian, Farhad
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 284 - 287
  • [50] Analysis of N-gram model on Telugu Document Classification
    Rani, B. Padmaja
    Vardhan, B. Vishnu
    Durga, A. Kanaka
    Reddy, L. Pratap
    Babu, A. Vinaya
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3199 - +