Developing an audio-visual speech source separation algorithm

被引:24
|
作者
Sodoyer, D
Girin, L
Jutten, C
Schwartz, JL
机构
[1] Univ Grenoble 3, INPG, ICP, CNRS UMR 5009, F-38031 Grenoble 1, France
[2] Univ Grenoble 1, INPG, LIS, CNRS UMR 5083, F-38041 Grenoble, France
关键词
blind source separation; audio-visual coherence; speech enhancement; audio-visual joint probability; spectral information;
D O I
10.1016/j.specom.2004.10.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Looking at the speaker's face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor with reference to a target source. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:113 / 125
页数:13
相关论文
共 50 条
  • [21] Expressive audio-visual speech
    Bevacqua, E
    Pelachaud, C
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
  • [22] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
    Yang, Karren
    Markovic, Dejan
    Krenn, Steven
    Agrawal, Vasu
    Richard, Alexander
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
  • [23] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
    Huyse, Aurelie
    Leybaert, Jacqueline
    Berthommier, Frederic
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
  • [24] Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
    Chatterjee, Moitreya
    Ahuja, Narendra
    Cherian, Anoop
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [25] Audio-Visual Based Online Multi-Source Separation
    Ong, Jonah
    Vo, Ba Tuong
    Nordholm, Sven
    Vo, Ba-Ngu
    Moratuwage, Diluka
    Shim, Changbeom
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1219 - 1234
  • [26] Image-driven Audio-visual Universal Source Separation
    Li, Chenxing
    Bai, Ye
    Wang, Yang
    Deng, Feng
    Zhao, Yuanyuan
    Zhang, Zhuo
    Wang, Xiaorui
    INTERSPEECH 2023, 2023, : 3729 - 3733
  • [27] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [28] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [29] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [30] Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
    Martel, Hector
    Richter, Julius
    Li, Kai
    Hu, Xiaolin
    Gerkmann, Timo
    INTERSPEECH 2023, 2023, : 1673 - 1677