A Study of Chinese Word Segmentation Based on the Characteristics of Chinese

被引:0
|
作者
Han, Aaron Li-Feng [1 ]
Wong, Derek F. [1 ]
Chao, Lidia S. [1 ]
He, Liangye [1 ]
Zhu, Ling [1 ]
Li, Shuo [1 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Macau, Peoples R China
关键词
Natural language processing; Chinese word segmentation; Characteristics of Chinese; Optimized features;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces the research on Chinese word segmentation (CWS). The word segmentation of Chinese expressions is difficult due to the fact that there is no word boundary in Chinese expressions and that there are some kinds of ambiguities that could result in different segmentations. To distinguish itself from the conventional research that usually emphasizes more on the algorithms employed and the workflow designed with less contribution to the discussion of the fundamental problems of CWS, this paper firstly makes effort on the analysis of the characteristics of Chinese and several categories of ambiguities in Chinese to explore potential solutions. The selected conditional random field models are trained with a quasi-Newton algorithm to perform the sequence labeling. To consider as much of the contextual information as possible, an augmented and optimized set of features is developed. The experiments show promising evaluation scores as compared to some related works.
引用
收藏
页码:111 / 118
页数:8
相关论文
共 50 条
  • [41] Which Performs Better for New Word Detection, Character Based or Chinese Word Segmentation Based?
    Zhang, Haijun
    Shi, Shumin
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 10 - 14
  • [42] An efficient Chinese word segmentation algorithm for Chinese information processing on the Internet
    Wong, PK
    [J]. INTERNET APPLICATIONS, 1999, 1749 : 427 - 432
  • [43] The role of text familiarity in Chinese word segmentation and Chinese vocabulary recognition
    Chen Mingjing
    Wang Yongsheng
    Zhao Bingjie
    Li Xin
    Bai Xuejun
    [J]. ACTA PSYCHOLOGICA SINICA, 2022, 54 (10) : 1151 - +
  • [44] A kind of dictionary mechanism based on the two-word-bitmap for Chinese word segmentation
    College of Computer and Communication, Hunan Univ., Changsha 410082, China
    [J]. Hunan Daxue Xuebao, 2006, 1 (121-123):
  • [45] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (04): : 418 - 424
  • [46] Enhancing Chinese Word Segmentation with Character Clustering
    Liu, Yijia
    Che, Wanxiang
    Liu, Ting
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 52 - 60
  • [47] Neural Domain Adaptation or Chinese Word Segmentation
    Bao, Zuyi
    Li, Si
    Xu, Weiran
    Gao, Sheng
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 131 - 134
  • [48] Multiple Character Embeddings for Chinese Word Segmentation
    Wang, Jingkang
    Zhou, Jianing
    Zhou, Jie
    Liu, Gongshen
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 210 - 216
  • [49] The role of semantic information in Chinese word segmentation
    Chen, Ruqi
    Huang, Linjieqiong
    Perea, Manuel
    Li, Xingshan
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2024,
  • [50] Fast and Accurate Neural Word Segmentation for Chinese
    Cai, Deng
    Zhao, Hai
    Zhang, Zhisong
    Xin, Yuan
    Wu, Yongjian
    Huang, Feiyue
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 608 - 615