Robust speech detection method for telephone speech recognition system

被引:11
|
作者
Kuroiwa, S [1 ]
Naito, M [1 ]
Yamamoto, S [1 ]
Higuchi, N [1 ]
机构
[1] KDD R&D Labs Inc, Kamifukuoka, Saitama 3566502, Japan
关键词
speech recognition; telephone; endpoint detection; irrelevant sounds; garbage model;
D O I
10.1016/S0167-6393(98)00072-7
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes speech endpoint detection methods for continuous speech recognition systems used over telephone networks. Speech input to these systems may be contaminated not only by various ambient noises but also by various irrelevant sounds generated by users such as coughs, tongue clicking, lip noises and certain out-of-task utterances. Under these adverse conditions, robust speech endpoint detection remains an unsolved problem. We found in fact, that speech endpoint detection errors occurred in over 10% of the inputs in field trials of a voice activated telephone extension system. These errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the first two problems, we propose a real-time speech ending point detection algorithm based on the implicit approach, which finds a sentence end by comparing the likelihood of a complete sentence hypothesis and other hypotheses. For the third problem, we propose a speech beginning point detection algorithm which rejects irrelevant sounds by using likelihood ratio and duration conditions. The effectiveness of these methods was evaluated under various conditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algorithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evaluated and we found that the garbage modeling technique and the proposed method compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:135 / 148
页数:14
相关论文
共 50 条
  • [41] Robust SBR method for adverse Mandarin speech recognition
    Hong, WT
    Chen, SH
    [J]. ELECTRONICS LETTERS, 1999, 35 (11) : 875 - 876
  • [42] A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement
    Shen, Guanghu
    Jung, Ho-Youl
    Chung, Hyun-Yeol
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (04): : 191 - 199
  • [43] A speech emphasis method for noise-robust speech recognition by using repetitive phrase
    Hirai, Takanori
    Kuroiwa, Shingo
    Tsuge, Satoru
    Ren, Fuji
    Fattah, Mohamed Abdel
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLS 1 AND 2, PROCEEDINGS, 2006, : 1269 - +
  • [44] DYNAMIC SELECTION OF A SPEECH ENHANCEMENT METHOD FOR ROBUST SPEECH RECOGNITION IN MOVING MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Ganchev, Todor
    Kocsis, Otilia
    Fakotakis, Nikos
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5176 - 5179
  • [45] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    Tranter, SE
    Yu, K
    Evermann, G
    Woodland, RC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756
  • [46] Histogram equalization of speech representation for robust speech recognition
    de la Torre, A
    Peinado, AM
    Segura, JC
    Pérez-Córdoba, JL
    Benítez, MC
    Rubio, AJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 355 - 366
  • [47] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
  • [48] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [49] CASA Based Speech Separation for Robust Speech Recognition
    Han Runqiang
    Zhao Pei
    Gao Qin
    Zhang Zhiping
    Wu Hao
    Wu Xihong
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
  • [50] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452