DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation

被引：13

作者：

Gogate, Mandar ^{[1
]}

Adeel, Ahsan ^{[1
]}

Marxer, Ricard ^{[2
,3
]}

Barker, Jon ^{[3
]}

Hussain, Amir ^{[1
]}

机构：

[1] Univ Stirling, Stirling, Scotland

[2] Aix Marseille Univ, Univ Toulon, CNRS, LIS, Marseille, France

[3] Univ Sheffield, Sheffield, S Yorkshire, England

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

英国工程与自然科学研究理事会;

关键词：

Speech Separation; Binary Mask Estimation; Deep Neural Network; Speech Enhancement; NOISE;

D O I：

10.21437/Interspeech.2018-2516

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human auditory cortex excels at selectively suppressing background noise to focus on a target speaker. The process of selective attention in the brain is known to contextually exploit the available audio and visual cues to better focus on target speaker while filtering out other noises. In this study, we propose a novel deep neural network (DNN) based audiovisual (AV) mask estimation model. The proposed AV mask estimation model contextually integrates the temporal dynamics of both audio and noise-immune visual features for improved mask estimation and speech separation. For optimal AV features extraction and ideal binary mask (IBM) estimation, a hybrid DNN architecture is exploited to leverages the complementary strengths of a stacked long short term memory (LSTM) and convolution LSTM network. The comparative simulation results in terms of speech quality and intelligibility demonstrate significant performance improvement of our proposed AV mask estimation model as compared to audio-only and visual-only mask estimation approaches for both speaker dependent and independent scenarios.

引用

下载

页码：2723 / 2727

页数：5

共 50 条

[31] Audio-Visual Speech Separation Using I-Vectors
Luo, Yiyu
Wang, Jing
Wang, Xinyao
Wen, Liang
Wang, Lizhong
2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 276 - 280
[32] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[33] Experience-driven audio-visual integration in speech perception
Stephens, J
Holt, L
JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, : 82 - 82
[34] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
[35] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Yang, Karren
Markovic, Dejan
Krenn, Steven
Agrawal, Vasu
Richard, Alexander
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
[36] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
Huyse, Aurelie
Leybaert, Jacqueline
Berthommier, Frederic
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
[37] Dynamic visual features for audio-visual speaker verification
Dean, David
Sridharan, Sridha
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 136 - 149
[38] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[39] An audio-visual speech recognition system for testing new audio-visual databases
Pao, Tsang-Long
Liao, Wen-Yuan
VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
[40] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
Zhang, Zi-Qiang
Zhang, Jie
Zhang, Jian-Shu
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350

← 1 2 3 4 5 →