An Innovative Prosody Modeling Method for Chinese Speech Recognition

被引:0
|
作者
Gang Peng
William S.-Y. Wang
机构
[1] City University of Hong Kong,Language Engineering Laboratory, Department of Electronic Engineering
关键词
Chinese dialects; speech recognition; prosody modeling; context-dependent;
D O I
10.1023/B:IJST.0000017013.70486.51
中图分类号
学科分类号
摘要
This paper presents an innovative method for prosody modeling in Chinese speech recognition. Our method first evaluated the reliability of the prosodic information by which the recognition system dynamically tunes the balance between the spectral scores and prosodic scores. The basic idea of this method is to use prosodic knowledge based on its reliability. The higher the reliability, the more the prosodic information contributes to recognition. Thus, this method will not introduce extra errors but will incorporate more knowledge into the recognition system. Experimental results showed that this method reduced the relative word error rate by as much as 52.9% and 46.0% for Mandarin and Cantonese digit string recognition tasks, respectively. When incorporating tone information into Cantonese Large Vocabulary Continuous Speech Recognition (LVCSR) via the proposed method, a 20.16% relative character error rate reduction was obtained.
引用
收藏
页码:129 / 140
页数:11
相关论文
共 50 条
  • [21] Prosody modification for speech recognition in emotionally mismatched conditions
    Vegesna, Vishnu Vidyadhara Raju
    Gurugubelli, Krishna
    Vuppala, Anil Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (03) : 521 - 532
  • [22] Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer
    Szaszak, Gyorgy
    Tundik, Mate Akos
    Beke, Andras
    [J]. KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 221 - 227
  • [23] Prosody modeling and Eigen-Prosody analysis for robust speaker recognition
    Chen, ZH
    Liao, YF
    Juang, YT
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 185 - 188
  • [24] Study of prosody model on Chinese speech synthesis based on the classification of syllabic prosody features
    Tao, Jianhua
    Cai, Lianhong
    [J]. Shengxue Xuebao/Acta Acustica, 2003, 28 (05): : 395 - 402
  • [25] Speech Recognition with Word Fragment Detection Using Prosody Features for Spontaneous Speech
    Yeh, Jui-Feng
    Yen, Ming-Chi
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2012, 6 (02): : 669S - 675S
  • [26] Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech
    Li, Chenda
    Qian, Yanmin
    [J]. INTERSPEECH 2019, 2019, : 3446 - 3450
  • [27] An Automatic Prosody Labeling Method for Mandarin Speech
    Chiang, Chen-Yu
    Yu, Hsiu-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 725 - +
  • [28] Articulatory-Functional Modeling of Speech Prosody: A Review
    Xu, Yi
    Prom-on, Santitham
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 46 - +
  • [29] Modeling prosody for language identification on read and spontaneous speech
    Rouas, JL
    Farinas, J
    Pellegrino, F
    André-Obrecht, R
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 753 - 756
  • [30] Modeling prosody for language identification on read and spontaneous speech
    Rouas, JL
    Farinas, J
    Pellegrino, F
    André-Obrecht, R
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 40 - 43