n-BiLSTM: BiLSTM with n-gram Features for Text Classification

被引：0

作者：

Zhang, Yunxiang ^{[1
]}

Rao, Zhuyi ^{[1
]}

机构：

[1] Shenzhen Power Supply Bur Co Ltd, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020) | 2020年

关键词：

text classification; n-gramm; bidirectional long short-term memory; deep learning;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text classification is widely existing in the fields of e-commerce and log message analysis. Besides, it is an essential module in text processing tasks. In this paper, we present a method to create an accurate and fast text classification system in both One-vs.-one and One-vs.-rest manner. Our approach, named n-BiLSTM, is used to convert natural text sentences into features similar to bag-of-words with n-gram techniques, and then the features are fed into a bidirectional LSTM. The two components are able to take better advantages of multi-scale feature representation and context information. Finally, the whole system is evaluated using two labeled movie review datasets, IMDB and SSTb, to test one-vs.-one and one-vs.-rest performances respectively. The results obtained show that our n-BiLSTM algorithm is superior to the basic LSTM and bidirectional LSTM algorithms.

引用

页码：1056 / 1059

页数：4

共 50 条

[41] Character N-Gram Tokenization for European Language Text Retrieval
Paul McNamee
James Mayfield
[J]. Information Retrieval, 2004, 7 : 73 - 97
[42] The textcat Package for n-Gram Based Text Categorization in R
Hornik, Kurt
Mair, Patrick
Rauch, Johannes
Geiger, Wilhelm
Buchta, Christian
Feinerer, Ingo
[J]. JOURNAL OF STATISTICAL SOFTWARE, 2013, 52 (06):
[43] N-gram language models for offline handwritten text recognition
Zimmermann, M
Bunke, H
[J]. NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 203 - 208
[44] N-gram and local context analysis for Persian text retrieval
Aleahmad, Abolfazl
Hakimian, Parsia
Mahdikhani, Farzad
Oroumchian, Farhad
[J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 284 - 287
[45] N-gram based approach for opinion mining of Punjabi text
Kaur, Amandeep
Gupta, Vishal
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8875 : 81 - 88
[46] Japanese text classification using N-gram and the maximum ratio of term frequency among categories
Suzuki, Makoto
[J]. PROCEDINGS OF THE 11TH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2007, : 197 - 202
[47] Character N-gram tokenization for European language text retrieval
McNamee, P
Mayfield, J
[J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 73 - 97
[48] EXPERIMENTS IN TEXT RECOGNITION WITH BINARY N-GRAM AND VITERBI ALGORITHMS
HULL, JJ
SRIHARI, SN
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1982, 4 (05) : 520 - 530
[49] An Evaluation of Character Level N-gram Termsets in Text Categorization
Coban, Onder
Ozel, Selma Ayse
[J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
[50] Language Identification of Short Text Segments with N-gram Models
Vatanen, Tommi
Vayrynen, Jaakko J.
Virpioja, Sami
[J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3423 - 3430

← 1 2 3 4 5 →