ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC

被引:0
|
作者
Malek, Jiri [1 ]
Zdansky, Jindrich [1 ]
Cerva, Petr [1 ]
机构
[1] Tech Univ Liberec, Fac Mechatron Informat & Interdisciplinary Studie, Studentska 2, Liberec 46117, Czech Republic
关键词
Robust recognition; background music; feature enhancement; denoising autoencoder; multi-condition training;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8 %. For real-world broadcast news and a high SNR (about 1 0 dB), we achieved improvement by 2 : 4 %. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).
引用
收藏
页码:5210 / 5214
页数:5
相关论文
共 50 条
  • [21] Noise Adaptive Training for Robust Automatic Speech Recognition
    Kalinli, Ozlem
    Seltzer, Michael L.
    Droppo, Jasha
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1889 - 1901
  • [22] Multiple resolution analysis for robust automatic speech recognition
    Gemello, R
    Mana, F
    Albesano, D
    De Mori, R
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 2 - 21
  • [23] Robust automatic speech recognition in impulsive noise environment
    Ding, P
    Cao, ZG
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (01) : 165 - 168
  • [24] CEPSTRAL NOISE SUBTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Rehr, Robert
    Gerkmann, Timo
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 375 - 378
  • [26] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
    Mahadevaswamy
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (03) : 2039 - 2058
  • [27] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    [J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [28] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [29] A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
    Bou-Ghazale, SE
    Assaleh, K
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3808 - 3811
  • [30] Automatic accent identification as an analytical tool for accent robust automatic speech recognition
    Najafian, Maryam
    Russell, Martin
    [J]. SPEECH COMMUNICATION, 2020, 122 : 44 - 55