Automatic Word Segmentation for Spoken Cantonese

被引:0
|
作者
Fung, Roxana [1 ]
Bigi, Brigitte [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Chinese & Bilingual Studies, Hong Kong, Hong Kong, Peoples R China
[2] Aix Marseille Univ, CNRS, Lab Parole & Langage, F-13100 Aix En Provence, France
关键词
segmentation; automatic; Cantonese; software; corpus;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Though Cantonese is the most influential variety of Chinese other than Mandarin, there are only a limited number of Cantonese corpora available for linguistic studies. Among the essential steps of building a corpus, word segmentation is a necessary but highly challenging task due to the lack of clear word boundary in Cantonese. This paper reports the construction and evaluation of an open-source automatic Cantonese word segmenter developed for Cantonese. The tool is a component of the multilingual SPPAS program designed to be used directly by linguists. It is a free software distributed under a GPL license. The effectiveness of the tool was evaluated by comparing the result of segmenting some samples of a spoken Cantonese corpus manually and automatically using the tool developed. High precision and recall were found in our study. Upon completion, the tool would definitely promote the development of more Cantonese corpora for language related studies.
引用
收藏
页码:196 / 201
页数:6
相关论文
共 50 条
  • [1] Phonological priming in Cantonese spoken-word processing
    Yip, MCW
    [J]. PSYCHOLOGIA, 2001, 44 (03) : 223 - 229
  • [2] Lexical tone in Cantonese spoken-word processing
    Cutler, A
    Chen, HC
    [J]. PERCEPTION & PSYCHOPHYSICS, 1997, 59 (02): : 165 - 179
  • [3] Lexical tone in Cantonese spoken-word processing
    Anne Cutler
    Hsuan-Chih Chen
    [J]. Perception & Psychophysics, 1997, 59 : 165 - 179
  • [4] Compounds, competition, and incremental word identification in spoken Cantonese
    Tsang, Cara
    Chambers, Craig G.
    Mozuraitis, Mindaugas
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2017, 32 (01) : 69 - 81
  • [5] What are effective phonological units in Cantonese spoken word planning?
    Wong, Andus Wing-Kuen
    Chen, Hsuan-Chih
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2009, 16 (05) : 888 - 892
  • [6] The use of tonal coarticulation cues in Cantonese spoken word recognition
    Qin, Zhen
    Zhang, Jingwei
    [J]. JASA EXPRESS LETTERS, 2022, 2 (03):
  • [7] What are effective phonological units in Cantonese spoken word planning?
    Andus Wing-Kuen Wong
    Hsuan-Chih Chen
    [J]. Psychonomic Bulletin & Review, 2009, 16 : 888 - 892
  • [8] Phonological Units in Spoken Word Production: Insights from Cantonese
    Wong, Andus Wing-Kuen
    Huang, Jian
    Chen, Hsuan-Chih
    [J]. PLOS ONE, 2012, 7 (11):
  • [9] The possible-word constraint in Cantonese speech segmentation
    Yip, MCW
    [J]. PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, PTS 1 AND 2, 2003, : 1419 - 1419
  • [10] Possible-Word Constraints in Cantonese Speech Segmentation
    Michael C. W. Yip
    [J]. Journal of Psycholinguistic Research, 2004, 33 : 165 - 173