SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS

被引：0

作者：

Suzuki, Masayuki ^{[1
]}

Kurata, Gakuto ^{[1
]}

Nagano, Tohru ^{[1
]}

Tachibana, Ryuki ^{[1
]}

机构：

[1] IBM Corp, Watson Multimodal, Tokyo, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Overlap; Monaural speech; Garbage model; Noise robust; Telephone conversation; SPEAKER DIARIZATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.

引用

页码：5685 / 5689

页数：5

共 50 条

[1] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
Li, Peng
Guan, Yong
Wang, Shijin
Xu, Bo
Liu, Wenju
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
[2] Robust speech recognition in telephone network
Han, MS
Park, GB
Park, JG
Han, JQ
[J]. PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1103 - 1106
[3] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[4] Robust speech detection method for telephone speech recognition system
Kuroiwa, S
Naito, M
Yamamoto, S
Higuchi, N
[J]. SPEECH COMMUNICATION, 1999, 27 (02) : 135 - 148
[5] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
Du, Zhihao
Han, Jiqing
Zhang, Xueliang
[J]. INTERSPEECH 2020, 2020, : 309 - 313
[6] Channel compensation for robust telephone speech recognition
Han, JQ
Han, MS
Gao, W
[J]. IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 169 - 172
[7] MONAURAL TELEPHONE SPEECH-PERCEPTION IN NOISE
TAN, KL
ONG, PP
SY, HK
TANG, SH
[J]. NOISE CONTROL ENGINEERING JOURNAL, 1984, 23 (03) : 123 - 124
[8] Robust telephone speech recognition based on channel compensation
Han, JQ
Gao, W
[J]. PATTERN RECOGNITION, 1999, 32 (06) : 1061 - 1067
[9] A robust front-end for telephone speech recognition
Cho, HY
Chi, SM
Oh, YH
[J]. PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
[10] Monaural speech separation and recognition challenge
Cooke, Martin
Hershey, John R.
Rennie, Steven J.
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 1 - 15

← 1 2 3 4 5 →