Developing an audio-visual speech source separation algorithm

被引:24
|
作者
Sodoyer, D
Girin, L
Jutten, C
Schwartz, JL
机构
[1] Univ Grenoble 3, INPG, ICP, CNRS UMR 5009, F-38031 Grenoble 1, France
[2] Univ Grenoble 1, INPG, LIS, CNRS UMR 5083, F-38041 Grenoble, France
关键词
blind source separation; audio-visual coherence; speech enhancement; audio-visual joint probability; spectral information;
D O I
10.1016/j.specom.2004.10.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Looking at the speaker's face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor with reference to a target source. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:113 / 125
页数:13
相关论文
共 50 条
  • [41] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [42] Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Tsao, Yu
    Lo, Chen-Chou
    Wang, Hsin-Min
    INTERSPEECH 2020, 2020, : 1131 - 1135
  • [43] Audio-visual enhancement of speech in noise
    Girin, L
    Schwartz, JL
    Feng, G
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 3007 - 3020
  • [44] Audio-visual speech processing and attention
    Sams, M
    PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
  • [45] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [46] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
    Song, Zengjie
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
  • [47] Visually Guided Sound Source Separation With Audio-Visual Predictive Coding
    Song, Zengjie
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15528 - 15542
  • [48] Move2Hear: Active Audio-Visual Source Separation
    Majumder, Sagnik
    Al-Halah, Ziad
    Grauman, Kristen
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 275 - 285
  • [49] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [50] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190