ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC

被引：0

作者：

Malek, Jiri ^{[1
]}

Zdansky, Jindrich ^{[1
]}

Cerva, Petr ^{[1
]}

机构：

[1] Tech Univ Liberec, Fac Mechatron Informat & Interdisciplinary Studie, Studentska 2, Liberec 46117, Czech Republic

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Robust recognition; background music; feature enhancement; denoising autoencoder; multi-condition training;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background, where the accuracy of recognition may deteriorate significantly. To improve the robustness of ASR in this task, e.g. for broadcast news transcription or subtitles creation, we adopt two approaches: 1) multi-condition training of the acoustic models and 2) denoising autoencoders followed by acoustic model training on the preprocessed data. In the latter case, two types of autoencoders are considered: the fully connected and the convolutional network. Presented experimental results show that all the investigated techniques are able to improve the recognition of speech distorted by music significantly. For example, in the case of artificial mixtures of speech and electronic music (low Signal-to-Noise Ratio (SNR) of 0 dB), we achieved absolute improvement of accuracy by 35.8 %. For real-world broadcast news and a high SNR (about 1 0 dB), we achieved improvement by 2 : 4 %. The important advantage of the studied approaches is that they do not deteriorate the accuracy in scenarios with clean speech (the decrease is about 1%).

引用

页码：5210 / 5214

页数：5

共 50 条

[1] ROBUST RECOGNITION OF SPEECH WITH BACKGROUND MUSIC IN ACOUSTICALLY UNDER-RESOURCED SCENARIOS
Malek, Jiri
Zdansky, Jindrich
Cerva, Petr
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5624 - 5628
[2] Speech/music discrimination for robust speech recognition in robots
Choi, Mu Yeol
Song, Hwa Jeon
Kim, Hyung Soon
[J]. 2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 118 - +
[3] The effects of background music on speech recognition accuracy
Raj, B
Parikh, VN
Stern, RM
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 851 - 854
[4] A distributed architecture for robust automatic speech recognition
Hacioglu, K
Pellom, B
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 328 - 331
[5] Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music
Heo, Woon-Haeng
Kim, Hyemi
Kwon, Oh-Wook
[J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1238 - 1241
[6] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
Paliwal, Kuldip K.
Lyons, James G.
So, Stephen
Stark, Anthony P.
Wojcicki, Kamil K.
[J]. 2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
[7] Environmental Noise Analysis for Robust Automatic Speech Recognition
Kishore, N. Sai Bala
Venkata, M. Rao
Nagamani, M.
[J]. ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315
[8] On properties of modulation spectrum for robust automatic speech recognition
Kanedera, N
Hermansky, H
Arai, T
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 613 - 616
[9] A Robust Feature Normalization Algorithm for Automatic Speech Recognition
Lei, Jianjun
Yang, Zhen
Wang, Jian
[J]. FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 473 - +
[10] Robust automatic speech recognition in the presence of impulsive noise
Potamitis, I
Fakotakis, N
Kokkinakis, G
[J]. ELECTRONICS LETTERS, 2001, 37 (12) : 799 - 800

← 1 2 3 4 5 →