A Short Text Classification Method Based on N-Gram and CNN

被引：33

作者：

Wang, Haitao ^{[1
]}

He, Jie ^{[1
]}

Zhang, Xiaohong ^{[1
]}

Liu, Shufen ^{[2
]}

机构：

[1] Henan Polytech Univ, Coll Comp Sci & Technol, Jiaozuo 454000, Henan, Peoples R China

[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China

来源：

CHINESE JOURNAL OF ELECTRONICS | 2020年 / 29卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Short text; Classification; Convolution neural network; N-gram; Concentration mechanism;

D O I：

10.1049/cje.2020.01.001

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Text classification is a fundamental task in Nature language process (NLP) application. Most existing research work relied on either explicate or implicit text representation to settle this kind of problems, while these techniques work well for sentence and can not simply apply to short text because of its shortness and sparseness feature. Given these facts that obtaining the simple word vector feature and ignoring the important feature by utilizing the traditional multi-size filter Convolution neural network (CNN) during the course of text classification task, we offer a kind of short text classification model by CNN, which can obtain the abundant text feature by adopting none linear sliding method and N-gram language model, and picks out the key features by using the concentration mechanism, in addition employing the pooling operation can preserve the text features at the most certain as far as possible. The experiment shows that this method we offered, comparing the traditional machine learning algorithm and convolutional neural network, can markedly improve the classification result during the short text classification.

引用

页码：248 / 254

页数：7

共 50 条

[31] Character-Based N-gram Model for Uyghur Text Retrieval
Tohti, Turdi
Xu, Lirui
Huang, Jimmy
Musajan, Winira
Hamdulla, Askar
[J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 678 - 688
[32] Turkish Meaningful Text Generation with Class Based N-Gram Model
Kutlugun, Mehmet Ali
Sirin, Yahya
[J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[33] N-Gram Based Paraphrase Generator from Large text Document
Gadag, Ashwini I.
Sagar, B. M.
[J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 91 - 94
[34] CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text
Xu, Wenhua
Huang, Hao
Zhang, Jie
Gu, Hao
Yang, Jie
Gui, Guan
[J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (12): : 6080 - 6096
[35] A machine learning approach for Arabic text classification using N-gram frequency statistics
Khreisat, Laila
[J]. JOURNAL OF INFORMETRICS, 2009, 3 (01) : 72 - 77
[36] Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary
Yukino Ikegami
Setsuo Tsuruta
[J]. Multimedia Tools and Applications, 2015, 74 : 3933 - 3946
[37] Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary
Ikegami, Yukino
Tsuruta, Setsuo
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (11) : 3933 - 3946
[38] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
Marovac, Ulfeta
Pljaskovic, Aldina
Crnisanin, Adela
Kajan, Ejub
[J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
[39] Chinese Text Categorization Using the Character N-gram
Suzuki, Makoto
Yamagishi, Naohide
Tsai, Yi-Ching
[J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA 2012), 2012, : 722 - 726
[40] Multilingual Text Categorization Using Character N-gram
Suzuki, Makoto
Yamagishi, Naohide
Tsai, Yi-Ching
Hirasawa, Shigeichi
[J]. 2008 IEEE CONFERENCE ON SOFT COMPUTING IN INDUSTRIAL APPLICATIONS SMCIA/08, 2009, : 49 - +

← 1 2 3 4 5 →