Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model

被引:13
|
作者
Kruengkrai, Canasai [1 ,2 ]
Uchimoto, Kiyotaka [1 ]
Kazama, Jun'ichi [1 ]
Wang, Yiou [1 ]
Torisawa, Kentaro [1 ]
Isahara, Hitoshi [1 ,2 ]
机构
[1] Kobe Univ, Grad Sch Engn, Kobe, Hyogo 6578501, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
关键词
word segmentation; POS tagging; error-driven; word-character hybrid model;
D O I
10.1587/transinf.E92.D.2298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
引用
收藏
页码:2298 / 2305
页数:8
相关论文
共 46 条
  • [1] An Effective Joint Model for Chinese Word Segmentation and POS Tagging
    Wang, Heng-Jun
    Si, Nian-Wen
    Chen, Cheng
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [2] Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
    Guo, Zhen
    Zhang, Yujie
    Su, Chen
    Xu, Jinan
    Isahara, Hitoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (01): : 257 - 264
  • [3] A Data-Driven Model for Automated Chinese Word Segmentation and POS Tagging
    Xu, Qing
    Wang, Zhiyou
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [4] A hybrid approach to word segmentation and POS tagging
    Oki Electric Industry Co., Ltd., 2−5−7 Honmachi, Chuo-ku, Osaka
    541−0053, Japan
    不详
    619−0289, Japan
    [J]. Proc. Annu. Meet. Assoc. Comput Linguist., 1600, (217-220):
  • [5] A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora
    Zhao, Jiayi
    Qiu, Xipeng
    Huang, Xuanjing
    [J]. 2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 227 - 230
  • [6] A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging
    Zhang, Meishan
    Yu, Nan
    Fu, Guohong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1528 - 1538
  • [7] Transformation-based error-driven learning for Chinese word segmentation
    He, Nan
    Dong, Yuan
    Ma, Xinnian
    Wang, Haila
    [J]. RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 35 - 39
  • [8] Joint Chinese word segmentation and POS tagging system with undirected graphical models
    Zhu C.-H.
    Zhao T.-J.
    Zheng D.-Q.
    [J]. Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2010, 32 (03): : 700 - 704
  • [9] Word segmentation and POS tagging for Chinese keyphrase extraction
    Huang, XC
    Chen, J
    Yan, PL
    Luo, X
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 364 - 369
  • [10] A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging
    Mao, Cunli
    Man, Zhibo
    Yu, Zhengtao
    Gao, Shengxiang
    Wang, Zhenhan
    Wang, Hongbin
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)