SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS

被引:0
|
作者
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
Nagano, Tohru [1 ]
Tachibana, Ryuki [1 ]
机构
[1] IBM Corp, Watson Multimodal, Tokyo, Japan
关键词
Overlap; Monaural speech; Garbage model; Noise robust; Telephone conversation; SPEAKER DIARIZATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.
引用
收藏
页码:5685 / 5689
页数:5
相关论文
共 50 条
  • [31] Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
    Fukuda, Takashi
    Ichikawa, Osamu
    Nishimura, Masafumi
    [J]. SPEECH COMMUNICATION, 2018, 98 : 95 - 103
  • [32] PERFORMANCE OF HARPY SPEECH RECOGNITION SYSTEM FOR TELEPHONE QUALITY SPEECH INPUT
    YEGNANARAYANA, B
    REDDY, DR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S78 - S78
  • [33] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    Tranter, SE
    Yu, K
    Evermann, G
    Woodland, RC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756
  • [34] Histogram equalization of speech representation for robust speech recognition
    de la Torre, A
    Peinado, AM
    Segura, JC
    Pérez-Córdoba, JL
    Benítez, MC
    Rubio, AJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 355 - 366
  • [35] Normalization of the Speech Modulation Spectra for Robust Speech Recognition
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1662 - 1674
  • [36] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [37] CASA Based Speech Separation for Robust Speech Recognition
    Han Runqiang
    Zhao Pei
    Gao Qin
    Zhang Zhiping
    Wu Hao
    Wu Xihong
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
  • [38] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [39] A Robust Speech Recognition System against the Ego Noise of a Robot
    Ince, Goekhan
    Nakadai, Kazuhiro
    Rodemann, Tobias
    Tsujino, Hiroshi
    Imura, Jun-ichi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2070 - +
  • [40] Speech/music discrimination for robust speech recognition in robots
    Choi, Mu Yeol
    Song, Hwa Jeon
    Kim, Hyung Soon
    [J]. 2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +