Using prosody to improve Mandarin automatic speech recognition

被引:0
|
作者
Ni, Chong-Jia [1 ]
Liu, Wen-Ju [1 ]
Xu, Bo [1 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
关键词
automatic speech recognition; prosody; MSD-HSMM; Maximum Entropy; CORPUS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal syllable model aiming at data sparse problem after prosody labeling. In this paper, we also utilize MSD-HSMM to model pitch, duration etc. influence factors of prosody, and at the same time, we unite MSD-HSMM model, prosody dependent tonal syllable duration model based on GMM and syntactical prosody model based on Maximum Entropy to decode. When compared with the baseline system, the performance of our prosody dependent speech recognition systems improves the correct rate of tonal syllable significantly.
引用
收藏
页码:2698 / 2701
页数:4
相关论文
共 50 条
  • [21] Hierarchical prosody modeling for Mandarin spontaneous speech
    Lin, Cheng-Hsien
    You, Chung-Long
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Chen, Sin-Horng
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (04): : 2576 - 2596
  • [22] LATENT PROSODY MODEL OF CONTINUOUS MANDARIN SPEECH
    Chiang, Chen-Yu
    Wang, Xiao-Dong
    Liao, Yuan-Fu
    Wang, Yih-Ru
    Chen, Sin-Horng
    Hirose, Keikichi
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 625 - +
  • [23] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chiang, Chen-Yu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [24] Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
    Wen, Miaomiao
    Wang, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2808 - +
  • [26] Mandarin telephone: Speech recognition for automatic telephone number directory service
    Wang, YR
    Chen, SH
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 841 - 844
  • [27] Automatic context induction for tone model integration in mandarin speech recognition
    HUANG HaoLI Binghu Department of Information Science and EngineeringXinjiang UniversityUrumqi China Laboratory of MultiLingual Information TechnologyXinjiang UniversityUrumqi China
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2012, 19 (01) : 94 - 100
  • [28] Automatic Pronunciation Scoring for Mandarin Proficiency Test based on Speech Recognition
    Liu, Yang
    Yang, Chunting
    Ma, Weifeng
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 168 - 171
  • [29] BART based semantic correction for Mandarin automatic speech recognition system
    Zhao, Yun
    Yang, Xuerui
    Wang, Jinchao
    Gao, Yongyu
    Yan, Chao
    Zhou, Yuanfu
    INTERSPEECH 2021, 2021, : 2017 - 2021
  • [30] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chen-Yu Chiang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018