Chinese Word Segmentation for Sub-character Representation

被引:0
|
作者
Zhang, Taozheng [1 ]
Shang, Chenyang [1 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing, Peoples R China
关键词
Chinese Word Segmentation; Bi-LSTM; DCNN; Sub-character;
D O I
10.1109/ICISFALL51598.2021.9627454
中图分类号
学科分类号
摘要
Nowadays, bidirectional long short-term memory neural network(Bi-LSTM) becomes the main structure for Chinese word segmentation tasks, which can obtain text information with time series. As a sequence model, the training speed of Bi-STM is very slow, while dilated convolution neural networks(DCNN) have a natural advantage in it which is designed to obtain information with a long length. In this paper, the sub-character information is concatenated with the ordinary features to enrich the input. Multiple contrast experiments are designed to verify the effect of applying DCNN and adding Conditional Random Fields (CRF). Experiments on the four datasets in SIGHAN2005 show that DCNN structure can improve the word segmentation effect in terms of Fl value and efficiency. The main advantage of the DCNN is that the speed is greatly faster than Bi-LSTM.
引用
收藏
页码:177 / 181
页数:5
相关论文
共 50 条
  • [1] Sub-Character Tokenization for Chinese Pretrained Language Models
    Si, Chenglei
    Zhang, Zhengyan
    Chen, Yingfa
    Qi, Fanchao
    Wang, Xiaozhi
    Liu, Zhiyuan
    Wang, Yasheng
    Liu, Qun
    Sun, Maosong
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 469 - 487
  • [2] Chinese Word Segmentation with Character Abstraction
    Tian, Le
    Qiu, Xipeng
    Huang, Xuanjing
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 36 - 43
  • [3] Enhancing Chinese Word Segmentation with Character Clustering
    Liu, Yijia
    Che, Wanxiang
    Liu, Ting
    [J]. CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 52 - 60
  • [4] Multiple Character Embeddings for Chinese Word Segmentation
    Wang, Jingkang
    Zhou, Jianing
    Zhou, Jie
    Liu, Gongshen
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 210 - 216
  • [5] A sub-character architecture for Korean language processing
    [J]. Stratos, Karl (stratos@ttic.edu), 2017, Association for Computational Linguistics (ACL)
  • [6] Which is essential for Chinese word segmentation: Character versus word
    Huang, Chang-Ning
    Zhao, Hai
    [J]. PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 1 - 12
  • [7] Which is essential for Chinese word segmentation: Character versusword
    Microsoft Research Asia, 49, Zhichun Road, Haidian District, Beijing-100080, China
    [J]. PACLIC - Proc. Pacific Asia Conf. Lang., Inf. Comput., 2006, (1-12):
  • [8] Federated Chinese Word Segmentation with Global Character Associations
    Tian, Yuanhe
    Chen, Guimin
    Qin, Han
    Song, Yan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4306 - 4313
  • [9] Lexical processing of Chinese sub-character components: Semantic activation of phonetic radicals as revealed by the Stroop effect
    Su-Ling Yeh
    Wei-Lun Chou
    Pokuan Ho
    [J]. Scientific Reports, 7
  • [10] Lexical processing of Chinese sub-character components: Semantic activation of phonetic radicals as revealed by the Stroop effect
    Yeh, Su-Ling
    Chou, Wei-Lun
    Ho, Pokuan
    [J]. SCIENTIFIC REPORTS, 2017, 7