Bayesian separation of audio-visual speech sources

被引：0

作者：

Rajaram, S ^{[1
]}

Nefian, AV ^{[1
]}

Huang, TS ^{[1
]}

机构：

[1] Univ Illinois, Urbana, IL 61801 USA

来源：

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.

引用

页码：657 / 660

页数：4

共 50 条

[1] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
Sodoyer, D
Schwartz, JL
Girin, L
Klinkisch, J
Jutten, C
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1165 - 1173
[2] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
David Sodoyer
Jean-Luc Schwartz
Laurent Girin
Jacob Klinkisch
Christian Jutten
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[3] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
[J]. Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
[4] Audio-Visual Deep Clustering for Speech Separation
Lu, Rui
Duan, Zhiyao
Zhang, Changshui
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1697 - 1712
[5] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
Ara V. Nefian
Luhong Liang
Xiaobo Pi
Xiaoxing Liu
Kevin Murphy
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[6] Dynamic Bayesian networks for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Liu, XX
Murphy, K
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
[7] Developing an audio-visual speech source separation algorithm
Sodoyer, D
Girin, L
Jutten, C
Schwartz, JL
[J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 113 - 125
[8] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
Li, Chenda
Qian, Yanmin
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
[9] Active Audio-Visual Separation of Dynamic Sound Sources
Majumder, Sagnik
Grauman, Kristen
[J]. COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 551 - 569
[10] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528

← 1 2 3 4 5 →