ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC

被引:0
|
作者
Malek, Jiri [1 ]
Zdansky, Jindrich [1 ]
Cerva, Petr [1 ]
机构
[1] Tech Univ Liberec, Fac Mechatron Informat & Interdisciplinary Studie, Studentska 2, Liberec 46117, Czech Republic
关键词
Robust recognition; background music; feature enhancement; denoising autoencoder; multi-condition training;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8 %. For real-world broadcast news and a high SNR (about 1 0 dB), we achieved improvement by 2 : 4 %. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).
引用
收藏
页码:5210 / 5214
页数:5
相关论文
共 50 条
  • [1] ROBUST RECOGNITION OF SPEECH WITH BACKGROUND MUSIC IN ACOUSTICALLY UNDER-RESOURCED SCENARIOS
    Malek, Jiri
    Zdansky, Jindrich
    Cerva, Petr
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5624 - 5628
  • [2] Speech/music discrimination for robust speech recognition in robots
    Choi, Mu Yeol
    Song, Hwa Jeon
    Kim, Hyung Soon
    [J]. 2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +
  • [3] The effects of background music on speech recognition accuracy
    Raj, B
    Parikh, VN
    Stern, RM
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 851 - 854
  • [4] A distributed architecture for robust automatic speech recognition
    Hacioglu, K
    Pellom, B
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 328 - 331
  • [5] Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music
    Heo, Woon-Haeng
    Kim, Hyemi
    Kwon, Oh-Wook
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1238 - 1241
  • [6] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
    Paliwal, Kuldip K.
    Lyons, James G.
    So, Stephen
    Stark, Anthony P.
    Wojcicki, Kamil K.
    [J]. 2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
  • [7] Environmental Noise Analysis for Robust Automatic Speech Recognition
    Kishore, N. Sai Bala
    Venkata, M. Rao
    Nagamani, M.
    [J]. ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315
  • [8] On properties of modulation spectrum for robust automatic speech recognition
    Kanedera, N
    Hermansky, H
    Arai, T
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 613 - 616
  • [9] A Robust Feature Normalization Algorithm for Automatic Speech Recognition
    Lei, Jianjun
    Yang, Zhen
    Wang, Jian
    [J]. FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 473 - +
  • [10] Robust automatic speech recognition in the presence of impulsive noise
    Potamitis, I
    Fakotakis, N
    Kokkinakis, G
    [J]. ELECTRONICS LETTERS, 2001, 37 (12) : 799 - 800