EFFECTS OF LOMBARD REFLEX ON THE PERFORMANCE OF DEEP-LEARNING-BASED AUDIO-VISUAL SPEECH ENHANCEMENT SYSTEMS

被引：0

作者：

Michelsanti, Daniel ^{[1
]}

Tan, Zheng-Hua ^{[1
]}

Sigurdsson, Sigurdur ^{[2
]}

Jensen, Jesper ^{[1
,2
]}

机构：

[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark

[2] Oticon AS, Copenhagen, Denmark

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Audio-visual speech enhancement; deep learning; Lombard effect; RECOGNITION; NOISE; AUDIO;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral ( non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems.

引用

页码：6615 / 6619

页数：5

共 50 条

[41] A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
Ivanko, Denis
Ryumin, Dmitry
Karpov, Alexey
[J]. MATHEMATICS, 2023, 11 (12)
[42] The impact of the Lombard effect on audio and visual speech recognition systems
Marxer, Ricard
Barker, Jon
Alghamdi, Najwa
Maddock, Steve
[J]. SPEECH COMMUNICATION, 2018, 100 : 58 - 68
[43] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
Li, Chenda
Qian, Yanmin
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
[44] Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Ma, Pingchuan
Petridis, Stavros
Pantic, Maja
[J]. INTERSPEECH 2019, 2019, : 4090 - 4094
[45] Mixture of Inference Networks for VAE-Based Audio-Visual Speech Enhancement
Sadeghi, Mostafa
Alameda-Pineda, Xavier
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 1899 - 1909
[46] Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement
Wang, Chenxi
Chen, Hang
Du, Jun
Yin, Baocai
Pan, Jia
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 255 - 259
[47] Audio-Visual Database for Spanish-Based Speech Recognition Systems
Cordova-Esparza, Diana-Margarita
Terven, Juan
Romero, Alejandro
Marcela Herrera-Navarro, Ana
[J]. ADVANCES IN SOFT COMPUTING, MICAI 2019, 2019, 11835 : 452 - 460
[48] The Conversation: Deep Audio -Visual Speech Enhancement
Afouras, Triantafyllos
Chung, Joon Son
Zisserman, Andrew
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
[49] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
Su, Rongfeng
Wang, Lan
Liu, Xunying
[J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
[50] Speech enhancement and recognition in meetings with an audio-visual sensor array
Maganti, Hari Krishna
Gatica-Perez, Daniel
McCowan, Iain
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269

← 1 2 3 4 5 →