Scaling conditional random field with application to Chinese word segmentation

被引:0
|
作者
Zhao, Hai [1 ]
Kit, Chunyu [1 ]
机构
[1] City Univ Hong Kong, Dept Chinese Translat & Linguist, 83 Tat Chee Ave, Kowloon, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a powerful sequence labeling model, conditional random field (CRF) has been applied to a number of natural language processing (NLP) tasks successfully. However, the high complexity of CRF training only allows a very small tag (or label)(1) set, because the training becomes intractable as the tag set enlarges. This paper proposes an improved decomposed training and joint decoding algorithm for CRF learning. Instead of training a single CRF model for all tags, it trains a binary sub-CRF independently for each tag. A predicted tag sequence is then produced by a joint decoding algorithm based on the probabilistic output of all sub-CRFs involved. To test its effectiveness, this approach is applied to tackle Chinese word segmentation (CWS) as a character tagging problem. Our evaluation shows that it can reduce time and memory cost by 20-39% and 44-50%, respectively, without any significant performance loss on various large-scale data sets.
引用
收藏
页码:95 / +
页数:3
相关论文
共 50 条
  • [1] Effective tag set selection in Chinese word segmentation via conditional random field modeling
    Zhao, Hai
    Huang, Chang-Ning
    Li, Mu
    Lu, Bao-Liang
    [J]. PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 87 - 94
  • [2] Chinese Word Segmentation based on Conditional Random Fields with Character Clustering
    Du, Liping
    Li, Xiaoge
    Liu, Chunli
    Liu, Rui
    Fan, Xian
    Yang, Jianing
    Lin, Dayi
    Wei, Mian
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 258 - 261
  • [3] A Conditional Random Fields Model for Overlapping Ambiguity Resolution in Chinese Word Segmentation
    Liang, Yan
    Zhu, Yaoting
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 384 - +
  • [4] A Chinese word segmentation model for energy literature based on Conditional Random Fields
    Zhao, Liujun
    Kong, Weizheng
    Chai, Bo
    [J]. 2018 2ND IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2018, : 785 - 788
  • [5] Segmentation Based Online Word Recognition: A Conditional Random Field Driven Beam Search Strategy
    Shivram, Arti
    Zhu, Bilan
    Setlur, Srirangaraj
    Nakagawa, Masaki
    Govindaraju, Venu
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 852 - 856
  • [6] Domain dependent word segmentation based on conditional random fields
    Fukuda, Takuya
    Izumi, Masataka
    Miura, Takao
    [J]. 2007 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 264 - 267
  • [7] Exploiting Unlabeled Internal Data in Conditional Random Fields to Reduce Word Segmentation Errors for Chinese Texts
    Tsai, Richard Tzong-Han
    Hung, Hsi-Chuan
    Dai, Hong-Jie
    Hsu, Wen-Lian
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2944 - 2947
  • [8] Gaussian Conditional Random Field Network for Semantic Segmentation
    Vemulapalli, Raviteja
    Tuzel, Oncel
    Liu, Ming-Yu
    Chellappa, Rama
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3224 - 3233
  • [9] FAST SEMANTIC SCENE SEGMENTATION WITH CONDITIONAL RANDOM FIELD
    Yang, Wen
    Dai, Dengxin
    Triggs, Bill
    Xia, Guisong
    He, Chu
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 229 - 232
  • [10] Word segmentation using domain knowledge based on conditional random fields
    Fukuda, Takuya
    Izzumi, Masataka
    Miura, Takao
    [J]. 19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL II, PROCEEDINGS, 2007, : 436 - 439