Acoustic Modeling in Mandarin Speech Recognition of Minority Accent in Yunnan

被引:0
|
作者
Wu Peishan [1 ]
Yang Jian [1 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Technol, Kunming 650091, Peoples R China
关键词
Speech Recognition; National Language in Yunnan; Pronunciation Variation; Multi-pronunciation Dictionary;
D O I
10.1109/CHICC.2008.4605230
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The dialectal and nonnative accents of speakers are challenge questions when spreading and developing the mandarin speech recognition system. This paper describes an integrated way which combines the rule-based data-driven (DD) method with the experts' knowledge to make acoustic models in automatic speech recognition (ASR). The aim is to get regular pairs of the pronunciation variation by statistics. Then, on the basis of this, we can construct the preliminary scheme of mandarin multi-pronunciation dictionary for minority accent in Yunnan. The combined method consists of the following steps. Firstly, baseline hidden Markov models (HMM) were trained by using the project 863 standard Mandarin corpus. Secondly, the nonnative speech data from Dai area, Lisu area and Naxi area in Yunnan was transcribed with the baseline HMMs. In addition, the transcribed result was aligned with the reference transcription through dynamic programming. After calculating of the confusion matrix, we analyze the error pairs due to substitute error at the level of base syllables, initials and finals. Next, we consider the regular mandarin pronunciation variation of national language in Yunnan. Many interesting and useful linguistic phenomena which are necessary for the advancement of nonnative Mandarin speech recognition technology were observed in our experiments.
引用
收藏
页码:526 / 530
页数:5
相关论文
共 5 条
  • [1] [Anonymous], HTK BOOK
  • [2] CHANG E, 2001, SPEECH LAB BOX MANDA
  • [3] HOSTE V, COMPUTER SPEECH LANG, V18, P1
  • [4] Li J., 2005, J COMPUTER SCI TECHN
  • [5] Pronunciation modeling for ASR - knowledge-based and data-derived methods
    Wester, M
    [J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (01): : 69 - 85