Robust speech detection method for telephone speech recognition system

被引：11

作者：

Kuroiwa, S ^{[1
]}

Naito, M ^{[1
]}

Yamamoto, S ^{[1
]}

Higuchi, N ^{[1
]}

机构：

[1] KDD R&D Labs Inc, Kamifukuoka, Saitama 3566502, Japan

来源：

SPEECH COMMUNICATION | 1999年 / 27卷 / 02期

关键词：

speech recognition; telephone; endpoint detection; irrelevant sounds; garbage model;

D O I：

10.1016/S0167-6393(98)00072-7

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes speech endpoint detection methods for continuous speech recognition systems used over telephone networks. Speech input to these systems may be contaminated not only by various ambient noises but also by various irrelevant sounds generated by users such as coughs, tongue clicking, lip noises and certain out-of-task utterances. Under these adverse conditions, robust speech endpoint detection remains an unsolved problem. We found in fact, that speech endpoint detection errors occurred in over 10% of the inputs in field trials of a voice activated telephone extension system. These errors were caused by problems of (1) low SNR, (2) long pauses between phrases and (3) irrelevant sounds prior to task sentences. To solve the first two problems, we propose a real-time speech ending point detection algorithm based on the implicit approach, which finds a sentence end by comparing the likelihood of a complete sentence hypothesis and other hypotheses. For the third problem, we propose a speech beginning point detection algorithm which rejects irrelevant sounds by using likelihood ratio and duration conditions. The effectiveness of these methods was evaluated under various conditions. As a result, we found that the ending point detection algorithm was not affected by long pauses and that the beginning point detection algorithm successfully rejected irrelevant sounds by using phone HMMs that fit the task. Furthermore, a garbage model of irrelevant sounds was also evaluated and we found that the garbage modeling technique and the proposed method compensated each other in their respective weak points and that the best recognition accuracy was achieved by integrating these methods. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：135 / 148

页数：14

共 50 条

[21] Japanese speech databases for robust speech recognition
Nakamura, A
Matsunaga, S
Shimizu, T
Tonomura, M
Sagisaka, Y
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2199 - 2202
[22] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
[J]. Wireless Personal Communications, 2023, 130 : 2039 - 2058
[23] The IBM 2016 English Conversational Telephone Speech Recognition System
Saon, George
Sercu, Tom
Rennie, Steven
Kuo, Hong-Kwang J.
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
[24] The IBM 2015 English Conversational Telephone Speech Recognition System
Saon, George
Kuo, Hong-Kwang J.
Rennie, Steven
Picheny, Michael
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
[25] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
Mahadevaswamy
[J]. WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (03) : 2039 - 2058
[26] Telephone speech recognition applications at IRST
Falavigna, D
Gretter, R
[J]. 1998 IEEE 4TH WORKSHOP INTERACTIVE VOICE TECHNOLOGY FOR TELECOMMUNICATIONS APPLICATIONS - IVTTA '98, 1998, : 27 - 30
[27] Improvements in recognition of conversational telephone speech
Peskin, B
Newman, M
McAllaster, D
Nagesha, V
Richards, H
Wegmann, S
Hunt, M
Gillick, L
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 53 - 56
[28] Conversational telephone speech recognition for Lithuanian
Lileiyte, Rasa
Lamel, Lori
Guvain, Jean-Luc
Gorin, Arseniy
[J]. COMPUTER SPEECH AND LANGUAGE, 2018, 49 : 71 - 82
[29] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
Wang, Haikun
Ye, Zhongfu
Chen, Jingdong
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
[30] A computational auditory scene analysis system for speech segregation and robust speech recognition
Shao, Yang
Srinivasan, Soundararajan
Jin, Zhaozhang
Wang, DeLiang
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 77 - 93

← 1 2 3 4 5 →