Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model

被引:13
|
作者
Kruengkrai, Canasai [1 ,2 ]
Uchimoto, Kiyotaka [1 ]
Kazama, Jun'ichi [1 ]
Wang, Yiou [1 ]
Torisawa, Kentaro [1 ]
Isahara, Hitoshi [1 ,2 ]
机构
[1] Kobe Univ, Grad Sch Engn, Kobe, Hyogo 6578501, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
关键词
word segmentation; POS tagging; error-driven; word-character hybrid model;
D O I
10.1587/transinf.E92.D.2298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
引用
收藏
页码:2298 / 2305
页数:8
相关论文
共 46 条
  • [21] Mandarin word-character hybrid-input Neural Network Language Model
    Kang, Moonyoung
    Tim Ng
    Long Nguyen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 632 - 635
  • [22] J-TranPSP: A Joint Transition-based Model for Prosodic Structure Prediction, Word Segmentation and PoS Tagging
    Shen, Binbin
    Luan, Jian
    Zhang, Shengyan
    Shen, Quanbo
    Wang, Yujun
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 280 - 284
  • [23] Thai Personal Named Entity Extraction without using Word Segmentation or POS Tagging
    Sutheebanjard, P.
    Premchaiswadi, W.
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 221 - 226
  • [24] A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
    Chen, Xinchi
    Qiu, Xipeng
    Huang, Xuanjing
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3960 - 3966
  • [25] Incorporating knowledge for joint Chinese word segmentation and part-of-speech tagging with SynSemGCN
    Tang, Xuemei
    Wang, Jun
    Su, Qi
    ASLIB JOURNAL OF INFORMATION MANAGEMENT, 2024,
  • [26] How Unsupervised Learning Affects Character Tagging based Chinese Word Segmentation: A Quantitative Investigation
    Song, Yan
    Kit, Chunyu
    Xu, Ruifeng
    Zhao, Hai
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3481 - 3486
  • [27] Research on the model of integrating Chinese word segmentation with part-of-speech tagging
    Tong, Xiaojun
    Cui, Minggen
    Song, Guolong
    DCABES 2007 Proceedings, Vols I and II, 2007, : 1062 - 1065
  • [28] Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts
    Qiu, Xipeng
    Qian, Peng
    Yin, Liusong
    Wu, Shiyu
    Huang, Xuanjing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 541 - 549
  • [29] A Deep Convolutional Neural Model for Character-Based Chinese Word Segmentation
    Xie, Zhipeng
    Hu, Junfeng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 380 - 392
  • [30] Character-based Joint Word Segmentation and Part-of-Speech Tagging for Tibetan Based on Deep Learning
    Li, Yan
    Li, Xiaomin
    Wang, Yiru
    Lv, Hui
    Li, Fenfang
    Duo, La
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)