Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model

被引:13
|
作者
Kruengkrai, Canasai [1 ,2 ]
Uchimoto, Kiyotaka [1 ]
Kazama, Jun'ichi [1 ]
Wang, Yiou [1 ]
Torisawa, Kentaro [1 ]
Isahara, Hitoshi [1 ,2 ]
机构
[1] Kobe Univ, Grad Sch Engn, Kobe, Hyogo 6578501, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
关键词
word segmentation; POS tagging; error-driven; word-character hybrid model;
D O I
10.1587/transinf.E92.D.2298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
引用
下载
收藏
页码:2298 / 2305
页数:8
相关论文
共 46 条
  • [11] Word-character attention model for Chinese text classification
    Xue Qiao
    Chen Peng
    Zhen Liu
    Yanfeng Hu
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3521 - 3537
  • [12] Word-character attention model for Chinese text classification
    Qiao, Xue
    Peng, Chen
    Liu, Zhen
    Hu, Yanfeng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (12) : 3521 - 3537
  • [13] Joint Word Segmentation, POS-Tagging and Syntactic Chunking
    Lyu, Chen
    Zhang, Yue
    Ji, Donghong
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3007 - 3014
  • [14] LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging
    Zhang, Jianhu
    Liu, Gongshen
    Zhou, Jie
    Zhou, Cheng
    Sun, Huanrong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 105 - 116
  • [15] A Fine-Grained Domain Adaption Model for Joint Word Segmentation and POS Tagging
    Jiang, Peijie
    Long, Dingkun
    Sun, Yueheng
    Zhang, Meishan
    Xu, Guangwei
    Xie, Pengjun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3587 - 3598
  • [16] Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging
    Zhao, Ling
    Zhang, Ailian
    Liu, Ying
    Fei, Hao
    PATTERN RECOGNITION LETTERS, 2020, 138 (138) : 163 - 169
  • [17] Bidirectional Deep Learning of Context Representation for Joint Word Segmentation and POS Tagging
    Boonkwan, Prachya
    Supnithi, Thepchai
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, ICCSAMA 2017, 2018, 629 : 184 - 196
  • [18] Simple semi-supervised learning for chinese word segmentation and pos tagging
    Li, Xinxin
    Wang, Xuan
    Waqas, Muhammad
    Harbin, Anwar
    Information Technology Journal, 2013, 12 (20) : 5955 - 5961
  • [19] A unified character-based tagging framework for chinese word segmentation
    Zhao H.
    Huang C.-N.
    Li Mu.
    Lu B.L.
    ACM Transactions on Asian Language Information Processing, 2010, 9 (02):
  • [20] Research on the Method and System of Word Segmentation and POS Tagging for Ancient Chinese Medicine Literature
    Fu, Xianjun
    Yuan, Ting
    Li, Xuebo
    Wang, Zhenguo
    Zhou, Yang
    Ju, Fangning
    Li, Jintong
    Chen, Xiaokang
    Sang Xiaoming
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2493 - 2498