Bayesian separation of audio-visual speech sources

被引:0
|
作者
Rajaram, S [1 ]
Nefian, AV [1 ]
Huang, TS [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of audio and visual rather than only audio features for the task of speech separation in acoustically noisy environments. The success of existing independent component analysis (ICA) systems for the separation of a large variety of signals, including speech, is often limited by the ability of this technique to handle noise. In this paper, we introduce a Bayesian model for the mixing process that describes both the bimodality and the time dependency of speech sources. Our experimental results show that the online demixing process presented here outperforms both the ICA and the audio-only Bayesian model at all levels of noise.
引用
收藏
页码:657 / 660
页数:4
相关论文
共 50 条
  • [1] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D
    Schwartz, JL
    Girin, L
    Klinkisch, J
    Jutten, C
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1165 - 1173
  • [2] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
    David Sodoyer
    Jean-Luc Schwartz
    Laurent Girin
    Jacob Klinkisch
    Christian Jutten
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [3] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    [J]. Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
  • [4] Audio-Visual Deep Clustering for Speech Separation
    Lu, Rui
    Duan, Zhiyao
    Zhang, Changshui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1697 - 1712
  • [5] Dynamic Bayesian Networks for Audio-Visual Speech Recognition
    Ara V. Nefian
    Luhong Liang
    Xiaobo Pi
    Xiaoxing Liu
    Kevin Murphy
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [6] Dynamic Bayesian networks for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Liu, XX
    Murphy, K
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1274 - 1288
  • [7] Developing an audio-visual speech source separation algorithm
    Sodoyer, D
    Girin, L
    Jutten, C
    Schwartz, JL
    [J]. SPEECH COMMUNICATION, 2004, 44 (1-4) : 113 - 125
  • [8] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
    Li, Chenda
    Qian, Yanmin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
  • [9] Active Audio-Visual Separation of Dynamic Sound Sources
    Majumder, Sagnik
    Grauman, Kristen
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 551 - 569
  • [10] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528