A POS Tagging Model for Vietnamese Social Media Text Using BiLSTM-CRF with Rich Features

被引:3
|
作者
Ngo Xuan Bach [1 ]
Trieu Khuong Duy [1 ]
Tu Minh Phuong [1 ]
机构
[1] Posts & Telecommun Inst Technol, Dept Comp Sci, Hanoi, Vietnam
关键词
Part-of-speech tagging; Social media text; Bidirectional long short-term memory; Conditional random field;
D O I
10.1007/978-3-030-29894-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with the task of part-of-speech (POS) tagging for Vietnamese social media text, which poses several challenges compared with tagging for conventional text. We introduce a POS tagging model that takes advantages of deep learning and manually engineered features to overcome the challenges of the task. The main part of the model consists of several bidirectional long short-term memory (BiLSTM) layers that are used to learn intermediate representations of sentences from features extracted at both the character and the word levels. Conditional random field (CRF) is then used on top of those BiL-STM layers, at the inference layer, to predict the most suitable POS tags. We leverage various types of manually engineered features in addition to automatically learned features to capture the characteristics of Vietnamese social media data and therefore improve the performance of the model. Experimental results on a public POS tagging corpus for Vietnamese social media text show that our model outperforms previous work [4] by a large margin, reaching 91.9% accuracy with 27% error rate reduction. The results also reveal the effectiveness of using both automatically learned and manually designed features in a deep learning framework when only a limited amount of training data is available.
引用
收藏
页码:206 / 219
页数:14
相关论文
共 31 条
  • [1] Vietnamese POS Tagging for Social Media Text
    Ngo Xuan Bach
    Nguyen Dieu Linh
    Tu Minh Phuong
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 233 - 242
  • [2] An empirical study on POS tagging for Vietnamese social media text
    Ngo Xuan Bach
    Nguyen Dieu Linh
    Tu Minh Phuong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2018, 50 : 1 - 15
  • [3] Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism
    Benali, B. Ait
    Mihi, S.
    Mlouk, A. Ait
    El Bazi, I
    Laachfoubi, N.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5427 - 5436
  • [4] Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text
    Arslan, Serdar
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (15): : 8371 - 8382
  • [5] Application of BiLSTM-CRF model with different embeddings for product name extraction in unstructured Turkish text
    Serdar Arslan
    [J]. Neural Computing and Applications, 2024, 36 : 8371 - 8382
  • [6] Jointly Detecting and Extracting Social Events From Twitter Using Gated BiLSTM-CRF
    Xu, Meng
    Zhang, Xin
    Guo, Lixiang
    [J]. IEEE ACCESS, 2019, 7 : 148462 - 148471
  • [7] New Word Detection Using BiLSTM plus CRF Model with Features
    Duan, Jianyong
    Tan, Zheng
    Zhang, Mei
    Wang, Hao
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (10): : 2228 - 2236
  • [8] HAZOP Text Named Entity Recognition using CNN-BilSTM-CRF Model
    Gao, Dong
    Peng, Lanfei
    Bai, Yujie
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6159 - 6164
  • [9] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
    Sunita Warjri
    Partha Pakray
    Saralin A. Lyngdoh
    Arnab Kumar Maji
    [J]. International Journal of Speech Technology, 2021, 24 : 853 - 864
  • [10] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
    Warjri, Sunita
    Pakray, Partha
    Lyngdoh, Saralin A.
    Maji, Arnab Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 853 - 864