Chinese word segmentation based on A-priori and adjacent characters

被引:0
|
作者
Wang, Y [1 ]
Huang, ST [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
word segmentation; adjacent characters; n-grams; A-priori;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chinese word segmentation is an important and difficult problem, due to the special written format of Chinese. In this paper, an adjacent characters and A-priori based algorithm is presented for segmentation. In this new method, the information of adjacent characters is utilized to join the n-grams and their adjacent characters. Experimental results show that the performance of the new method is remarkably better than the mutual information based methods when LDC95T13 Chinese collection is tested.
引用
收藏
页码:3808 / 3813
页数:6
相关论文
共 50 条
  • [1] Semidefinite clustering for image segmentation with a-priori knowledge
    Heiler, M
    Keuchel, J
    Schnörr, C
    [J]. PATTERN RECOGNITION, PROCEEDINGS, 2005, 3663 : 309 - 317
  • [2] CRFs based Chinese word segmentation
    Gui, Kunzhi
    Ren, Yong
    Peng, Zhaomeng
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 4376 - 4379
  • [3] A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
    Han, Aaron Li-Feng
    Wong, Derek F.
    Chao, Lidia S.
    He, Liangye
    Zhu, Ling
    Li, Shuo
    [J]. LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 111 - 118
  • [4] Ancient Books Chinese Characters Segmentation Based on Connected Domain and Chinese Characters Feature
    Zhu Lei
    Yang Jing
    [J]. SMART MATERIALS AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2011, 143-144 : 227 - +
  • [5] Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese Machine Translation
    Chu, Chenhui
    Nakazawa, Toshiaki
    Kawahara, Daisuke
    Kurohashi, Sadao
    [J]. Proceedings of the 16th Annual Conference of the European Association for Machine Translation, EAMT 2012, 2012, : 35 - 42
  • [6] A Word Segmentation Method of Ancient Chinese Based on Word Alignment
    Che, Chao
    Zhao, Hanyu
    Wu, Xiaoting
    Zhou, Dongsheng
    Zhang, Qiang
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 761 - 772
  • [7] A Chinese Word Segmentation Based on Machine Learning
    Wang Hongsheng
    Cui Mingming
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL II, 2009, : 610 - 613
  • [8] Chinese Word Segmentation Based on Deep Learning
    Wang, Mengge
    Li, Xiaoge
    Wei, Zheng
    Zhi, Shuting
    Wang, Haoyue
    [J]. PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 16 - 20
  • [9] Chinese word segmentation based on contextual entropy
    Huang, JH
    Powers, D
    [J]. PACLIC 17: Language, Information and Computation, Proceedings, 2003, : 152 - 158
  • [10] STRUCTURAL ANALYSIS BASED STROKE SEGMENTATION FOR CHINESE CHARACTERS
    Lam, Josh H. M.
    Yam, Yeung
    [J]. PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, : 3118 - 3123