Mandarin pronunciation modeling based on CASS corpus

被引:8
|
作者
Zheng, F [1 ]
Song, ZJ
Fung, P
Byrne, W
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Ctr Speech Technol, Beijing 100084, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Hong Kong, Hong Kong, Peoples R China
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
来源
基金
美国国家科学基金会;
关键词
pronunciation modeling; generalized initial and final; generalized syllable; refined acoustic modeling; context-dependent weighting; iterative forced-alignment based transcribing;
D O I
10.1007/BF02947304
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. In this paper, the factors that may affect the recognition performance are analyzed, including those specific to the Chinese language. By studying the INITIAL/FINAL (IF) characteristics of Chinese language and developing the Bayesian equation, the concepts of generalized INITIAL/FINAL (GIF) and generalized syllable (GS), the GIF modeling and the IF-GIF modeling, as well as the context-dependent pronunciation weighting, are proposed based on a well phonetically transcribed seed database. By using these methods, the Chinese syllable error rate (SER) is reduced by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language model, such as syllable or word N-gram, is not used. The effectiveness of these methods is also proved when more data without the phonetic transcription are used to refine the acoustic model using the proposed iterative forced-alignment based transcribing (IFABT) method, achieving a 5.7% SER reduction.
引用
收藏
页码:249 / 263
页数:15
相关论文
共 50 条
  • [31] Learner strategies for dealing with pronunciation issues in Mandarin
    Jiang, Xiaoli
    Cohen, Andrew D.
    [J]. SYSTEM, 2018, 76 : 25 - 37
  • [32] Triphone model reconstruction for mandarin pronunciation variations
    Fung, P
    Yi, L
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 760 - 763
  • [33] A Mobile Learning System for learning Mandarin Pronunciation
    Zhang, Long
    Wang, Jianhua
    Li, Haifeng
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 621 - 624
  • [34] Naxi-Accented Mandarin Speech Recognition Based on Pronunciation Dictionary Adaptation
    Chen Jiang
    Yang Jian
    Xu Yonghua
    [J]. PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2685 - 2689
  • [35] Subjectivity and result marking in Mandarin A corpus-based investigation
    Li, Fang
    Evers-Vermeul, Jacqueline
    Sanders, Ted J. M.
    [J]. CHINESE LANGUAGE AND DISCOURSE, 2013, 4 (01) : 74 - 119
  • [36] Directionality of linguistic synesthesia in Mandarin: A corpus-based study
    Zhao, Qingqing
    Huang, Chu-Ren
    Ahrens, Kathleen
    [J]. LINGUA, 2019, 232
  • [37] Aspect in Mandarin Chinese: A corpus-based study.
    Tao, Hongyin
    [J]. STUDIES IN LANGUAGE, 2006, 30 (03): : 627 - 632
  • [38] Classifiers in Singapore Mandarin Chinese: A Corpus-based Study
    Yuan, Xuelian
    Lin, Jingxia
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2016, 2016, 10085 : 65 - 75
  • [39] Spontaneous Mandarin production: Results of a corpus-based study
    Tseng, SC
    [J]. 2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 29 - 32
  • [40] A CORPUS-BASED ANALYSIS OF MANDARIN NOMINAL ROOT COMPOUND
    SPROAT, R
    SHIH, C
    [J]. JOURNAL OF EAST ASIAN LINGUISTICS, 1996, 5 (01) : 49 - 71