Robust F0 Modeling for Mandarin Speech Recognition in Noise

被引:0
|
作者
Qiang, Sheng [1 ]
Qian, Yao
Soong, Frank K.
Xu, Congfu [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
关键词
Tone model; Mandarin speech recognition; MSD; Noisy digit recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The F0 contour plays an important role in recognizing spoken tonal languages like Mandarin Chinese. However, the discontinuity of F0 between voiced and unvoiced transition has traditionally been a bottleneck in creating a succinct statistical tone model for automatic speech recognition applications. By applying successfully the Multi-Space Distribution (MSD) to tone modeling, we recently reported a relative 24% reduction of tonal syllable errors on a Mandarin speech database. In this paper, we test MSD further in a noisy, continuous Mandarin digit recognition task, where eight types of noises are added to clean speech signals at five SNRs. The experimental results show that our MSD-based digit models can significantly improve the recognition performance in noise over a baseline system. Relative digit error rate reductions of 19.1% and 15.0% are obtained for noises seen and unseen in the training data, respectively. The improvements are also better than other reference systems where F0 information is incorporated.
引用
收藏
页码:1101 / +
页数:2
相关论文
共 50 条
  • [1] Noise robust speech recognition using F0 contour information
    Iwano, K
    Seki, T
    Furui, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1102 - 1109
  • [2] F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition
    Wang, Xiaoyun
    Lu, Xugang
    Kawai, Hisashi
    Yamamoto, Seiichi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 973 - +
  • [3] F0 Declination in English and Mandarin Broadcast News Speech
    Yuan, Jiahong
    Liberman, Mark
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 134 - 137
  • [4] F0 contour of prosodic word in happy speech of mandarin
    Wang, HB
    Li, AJ
    Fang, Q
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 433 - 440
  • [5] F0 declination in English and Mandarin Broadcast News Speech
    Yuan, Jiahong
    Liberman, Mark
    [J]. SPEECH COMMUNICATION, 2014, 65 : 67 - 74
  • [6] Generative modeling of speech F0 contours
    Kameoka, Hirokazu
    Yoshizato, Kota
    Ishihara, Tatsuma
    Ohishi, Yasunori
    Kashino, Kunio
    Sagayama, Shigeki
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1825 - 1829
  • [7] What's in the F0 of Mandarin Speech -Tones, Intonation and beyond
    Tseng, Chiu-yu
    Su, Zhao-yu
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 45 - 48
  • [8] Effect of F0 contour on perception of Mandarin Chinese speech against masking
    Wu, Meihong
    [J]. PLOS ONE, 2019, 14 (01):
  • [9] F0 patterns in Mandarin statements of Mandarin and Cantonese speakers
    Yang, Yike
    Chen, Si
    Chen, Xi
    [J]. INTERSPEECH 2020, 2020, : 4163 - 4167
  • [10] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +