Using prosody to improve Mandarin automatic speech recognition

被引：0

作者：

Ni, Chong-Jia ^{[1
]}

Liu, Wen-Ju ^{[1
]}

Xu, Bo ^{[1
]}

机构：

[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

automatic speech recognition; prosody; MSD-HSMM; Maximum Entropy; CORPUS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal syllable model aiming at data sparse problem after prosody labeling. In this paper, we also utilize MSD-HSMM to model pitch, duration etc. influence factors of prosody, and at the same time, we unite MSD-HSMM model, prosody dependent tonal syllable duration model based on GMM and syntactical prosody model based on Maximum Entropy to decode. When compared with the baseline system, the performance of our prosody dependent speech recognition systems improves the correct rate of tonal syllable significantly.

引用

页码：2698 / 2701

页数：4

共 50 条

[21] Hierarchical prosody modeling for Mandarin spontaneous speech
Lin, Cheng-Hsien
You, Chung-Long
Chiang, Chen-Yu
Wang, Yih-Ru
Chen, Sin-Horng
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (04): : 2576 - 2596
[22] LATENT PROSODY MODEL OF CONTINUOUS MANDARIN SPEECH
Chiang, Chen-Yu
Wang, Xiao-Dong
Liao, Yuan-Fu
Wang, Yih-Ru
Chen, Sin-Horng
Hirose, Keikichi
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 625 - +
[23] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chiang, Chen-Yu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
[24] Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
Wen, Miaomiao
Wang, Miaomiao
Hirose, Keikichi
Minematsu, Nobuaki
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2808 - +
[25] Automatic context induction for tone model integration in mandarin speech recognition
HUANG Hao1
The Journal of China Universities of Posts and Telecommunications, 2012, (01) : 94 - 100
[26] Mandarin telephone: Speech recognition for automatic telephone number directory service
Wang, YR
Chen, SH
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 841 - 844
[27] Automatic context induction for tone model integration in mandarin speech recognition
HUANG HaoLI Binghu Department of Information Science and EngineeringXinjiang UniversityUrumqi China Laboratory of MultiLingual Information TechnologyXinjiang UniversityUrumqi China
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2012, 19 (01) : 94 - 100
[28] Automatic Pronunciation Scoring for Mandarin Proficiency Test based on Speech Recognition
Liu, Yang
Yang, Chunting
Ma, Weifeng
2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 168 - 171
[29] BART based semantic correction for Mandarin automatic speech recognition system
Zhao, Yun
Yang, Xuerui
Wang, Jinchao
Gao, Yongyu
Yan, Chao
Zhou, Yuanfu
INTERSPEECH 2021, 2021, : 2017 - 2021
[30] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
Chen-Yu Chiang
EURASIP Journal on Audio, Speech, and Music Processing, 2018

← 1 2 3 4 5 →