SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS

被引:0
|
作者
Suzuki, Masayuki [1 ]
Kurata, Gakuto [1 ]
Nagano, Tohru [1 ]
Tachibana, Ryuki [1 ]
机构
[1] IBM Corp, Watson Multimodal, Tokyo, Japan
关键词
Overlap; Monaural speech; Garbage model; Noise robust; Telephone conversation; SPEAKER DIARIZATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.
引用
收藏
页码:5685 / 5689
页数:5
相关论文
共 50 条
  • [1] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    Li, Peng
    Guan, Yong
    Wang, Shijin
    Xu, Bo
    Liu, Wenju
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
  • [2] Robust speech recognition in telephone network
    Han, MS
    Park, GB
    Park, JG
    Han, JQ
    [J]. PROGRESS IN CONNECTIONIST-BASED INFORMATION SYSTEMS, VOLS 1 AND 2, 1998, : 1103 - 1106
  • [3] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [4] Robust speech detection method for telephone speech recognition system
    Kuroiwa, S
    Naito, M
    Yamamoto, S
    Higuchi, N
    [J]. SPEECH COMMUNICATION, 1999, 27 (02) : 135 - 148
  • [5] Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition
    Du, Zhihao
    Han, Jiqing
    Zhang, Xueliang
    [J]. INTERSPEECH 2020, 2020, : 309 - 313
  • [6] Channel compensation for robust telephone speech recognition
    Han, JQ
    Han, MS
    Gao, W
    [J]. IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 169 - 172
  • [7] MONAURAL TELEPHONE SPEECH-PERCEPTION IN NOISE
    TAN, KL
    ONG, PP
    SY, HK
    TANG, SH
    [J]. NOISE CONTROL ENGINEERING JOURNAL, 1984, 23 (03) : 123 - 124
  • [8] Robust telephone speech recognition based on channel compensation
    Han, JQ
    Gao, W
    [J]. PATTERN RECOGNITION, 1999, 32 (06) : 1061 - 1067
  • [9] A robust front-end for telephone speech recognition
    Cho, HY
    Chi, SM
    Oh, YH
    [J]. PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
  • [10] Monaural speech separation and recognition challenge
    Cooke, Martin
    Hershey, John R.
    Rennie, Steven J.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 1 - 15