SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS

被引：0

作者：

Suzuki, Masayuki ^{[1
]}

Kurata, Gakuto ^{[1
]}

Nagano, Tohru ^{[1
]}

Tachibana, Ryuki ^{[1
]}

机构：

[1] IBM Corp, Watson Multimodal, Tokyo, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Overlap; Monaural speech; Garbage model; Noise robust; Telephone conversation; SPEAKER DIARIZATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.

引用

页码：5685 / 5689

页数：5

共 50 条

[31] Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
Fukuda, Takashi
Ichikawa, Osamu
Nishimura, Masafumi
[J]. SPEECH COMMUNICATION, 2018, 98 : 95 - 103
[32] PERFORMANCE OF HARPY SPEECH RECOGNITION SYSTEM FOR TELEPHONE QUALITY SPEECH INPUT
YEGNANARAYANA, B
REDDY, DR
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S78 - S78
[33] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
Tranter, SE
Yu, K
Evermann, G
Woodland, RC
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756
[34] Histogram equalization of speech representation for robust speech recognition
de la Torre, A
Peinado, AM
Segura, JC
Pérez-Córdoba, JL
Benítez, MC
Rubio, AJ
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 355 - 366
[35] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
[36] Robust distributed speech recognition using speech enhancement
Flynn, Ronan
Jones, Edward
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
[37] CASA Based Speech Separation for Robust Speech Recognition
Han Runqiang
Zhao Pei
Gao Qin
Zhang Zhiping
Wu Hao
Wu Xihong
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
[38] Compensation of speech enhancement distortion for robust speech recognition
Ding, P
Cao, ZG
[J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
[39] A Robust Speech Recognition System against the Ego Noise of a Robot
Ince, Goekhan
Nakadai, Kazuhiro
Rodemann, Tobias
Tsujino, Hiroshi
Imura, Jun-ichi
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2070 - +
[40] Speech/music discrimination for robust speech recognition in robots
Choi, Mu Yeol
Song, Hwa Jeon
Kim, Hyung Soon
[J]. 2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +

← 1 2 3 4 5 →