EFFECTS OF LOMBARD REFLEX ON THE PERFORMANCE OF DEEP-LEARNING-BASED AUDIO-VISUAL SPEECH ENHANCEMENT SYSTEMS

被引:0
|
作者
Michelsanti, Daniel [1 ]
Tan, Zheng-Hua [1 ]
Sigurdsson, Sigurdur [2 ]
Jensen, Jesper [1 ,2 ]
机构
[1] Aalborg Univ, Dept Elect Syst, Aalborg, Denmark
[2] Oticon AS, Copenhagen, Denmark
关键词
Audio-visual speech enhancement; deep learning; Lombard effect; RECOGNITION; NOISE; AUDIO;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral ( non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems.
引用
收藏
页码:6615 / 6619
页数:5
相关论文
共 50 条
  • [41] A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition
    Ivanko, Denis
    Ryumin, Dmitry
    Karpov, Alexey
    [J]. MATHEMATICS, 2023, 11 (12)
  • [42] The impact of the Lombard effect on audio and visual speech recognition systems
    Marxer, Ricard
    Barker, Jon
    Alghamdi, Najwa
    Maddock, Steve
    [J]. SPEECH COMMUNICATION, 2018, 100 : 58 - 68
  • [43] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
    Li, Chenda
    Qian, Yanmin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
  • [44] Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
    Ma, Pingchuan
    Petridis, Stavros
    Pantic, Maja
    [J]. INTERSPEECH 2019, 2019, : 4090 - 4094
  • [45] Mixture of Inference Networks for VAE-Based Audio-Visual Speech Enhancement
    Sadeghi, Mostafa
    Alameda-Pineda, Xavier
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 1899 - 1909
  • [46] Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement
    Wang, Chenxi
    Chen, Hang
    Du, Jun
    Yin, Baocai
    Pan, Jia
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 255 - 259
  • [47] Audio-Visual Database for Spanish-Based Speech Recognition Systems
    Cordova-Esparza, Diana-Margarita
    Terven, Juan
    Romero, Alejandro
    Marcela Herrera-Navarro, Ana
    [J]. ADVANCES IN SOFT COMPUTING, MICAI 2019, 2019, 11835 : 452 - 460
  • [48] The Conversation: Deep Audio -Visual Speech Enhancement
    Afouras, Triantafyllos
    Chung, Joon Son
    Zisserman, Andrew
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
  • [49] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [50] Speech enhancement and recognition in meetings with an audio-visual sensor array
    Maganti, Hari Krishna
    Gatica-Perez, Daniel
    McCowan, Iain
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269