Noise-Robust speech recognition of Conversational Telephone Speech

被引:0
|
作者
Chen, Gang [1 ]
Tolba, Hesham [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, Ste Foy, PQ G1V 2M3, Canada
关键词
speech recognition; H.323; telephone speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past several years, the primary focus of investigation for speech recognition has been over the telephone or IP network. Recently more and more IP telephony has been extensively used. This paper describes the performance of a speech recognizer on noisy speech transmitted over an H.323 IP telephony network, where the minimum mean-square error log spectra amplitude (MMSE-LSA) method [1,2] is used to reduce the mismatch between training and deployment condition in order to achieve robust speech recognition. In the H.323 network environment, the sources of distortion to the speech are packet loss and additive noise. In this work, we evaluate the impact of packet losses on speech recognition performance first, and then explore the effects of uncorrelated additive noise on the performance. To explore how additive acoustic noise affects the speech recognition performance, seven types of noise sources are selected for use in our experiments. Finally, the experimental results indicate that the MMSE-LSA enhancement method apparently increased robustness for some types of additive noise under certain packet loss rates over the H.323 telephone network.
引用
收藏
页码:1101 / 1104
页数:4
相关论文
共 50 条
  • [1] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [2] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
  • [3] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [4] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [5] Frame decorrelation for noise-robust speech recognition
    Jung, HY
    Kim, DY
    Un, CK
    [J]. ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164
  • [6] Frame decorrelation for noise-robust speech recognition
    Korea Advanced Inst of Science and, Technology, Taejon, Korea, Republic of
    [J]. Electron Lett, 13 (1163-1164):
  • [7] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
  • [8] Conversational telephone speech recognition
    Gauvain, JL
    Lamel, L
    Schwenk, H
    Adda, G
    Chen, L
    Lefèvre, F
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 212 - 215
  • [9] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    [J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [10] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    [J]. NEUROCOMPUTING, 2003, 52-4 : 615 - 620