Robust speech detection method for telephone speech recognition system

被引:11
|
作者
Kuroiwa, S [1 ]
Naito, M [1 ]
Yamamoto, S [1 ]
Higuchi, N [1 ]
机构
[1] KDD R&D Labs Inc, Kamifukuoka, Saitama 3566502, Japan
关键词
speech recognition; telephone; endpoint detection; irrelevant sounds; garbage model;
D O I
10.1016/S0167-6393(98)00072-7
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes speech endpoint detection methods for continuous speech recognition systems used over telephone networks. Speech input to these systems may be contaminated not only by various ambient noises but also by various irrelevant sounds generated by users such as coughs, tongue clicking, lip noises and certain out-of-task utterances. Under these adverse conditions, robust speech endpoint detection remains an unsolved problem. We found in fact, that speech endpoint detection errors occurred in over 10% of the inputs in field trials of a voice activated telephone extension system. These errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the first two problems, we propose a real-time speech ending point detection algorithm based on the implicit approach, which finds a sentence end by comparing the likelihood of a complete sentence hypothesis and other hypotheses. For the third problem, we propose a speech beginning point detection algorithm which rejects irrelevant sounds by using likelihood ratio and duration conditions. The effectiveness of these methods was evaluated under various conditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algorithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evaluated and we found that the garbage modeling technique and the proposed method compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:135 / 148
页数:14
相关论文
共 50 条
  • [31] Robust speech detection based on phoneme recognition features
    Mihelic, France
    Zibert, Janez
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 455 - 462
  • [32] Robust speech recognition using time boundary detection
    Mohajer, K
    Hu, ZM
    [J]. MULTISENSOR, MULTISOURCE INFORMATION FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS 2003, 2003, 5099 : 335 - 343
  • [33] AUTOMATIC DETECTION OF ANGER IN TELEPHONE SPEECH WITH ROBUST AUTOREGRESSIVE MODULATION FILTERING
    Pohjalainen, Jouni
    Alku, Paavo
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7537 - 7541
  • [34] Signal bias removal by maximum likelihood estimation for robust telephone speech recognition
    Rahim, MG
    Juang, BH
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (01): : 19 - 30
  • [35] A Robust Method for Speech Replay Attack Detection
    Lin, Lang
    Wang, Rangding
    Yan, Diqun
    Dong, Li
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (01) : 168 - 182
  • [36] A new method of robust detection for speech stream
    Li, XY
    Shen, LR
    Dong, X
    Zhang, RB
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1066 - 1069
  • [37] AN IMPROVED METHOD FOR ROBUST SPEECH ENDPOINT DETECTION
    Long, Hai-Nan
    Zhang, Cui-Gai
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 2067 - 2071
  • [38] A new robust telephone speech recognition algorithm with the multi-model structures
    Liu, J
    Pan, SX
    Wang, ZY
    Xia, SH
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2000, 9 (02) : 169 - 174
  • [39] A robust feature extraction method based on CZCPA model for speech recognition system
    Zhang, XY
    Jiao, ZP
    Zhao, SY
    [J]. ICEMI 2005: Conference Proceedings of the Seventh International Conference on Electronic Measurement & Instruments, Vol 3, 2005, : 89 - 92
  • [40] ON A MODEL-ROBUST TRAINING METHOD FOR SPEECH RECOGNITION
    NADAS, A
    NAHAMOO, D
    PICHENY, MA
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1988, 36 (09): : 1432 - 1436