Developing an audio-visual speech source separation algorithm

被引:24
|
作者
Sodoyer, D
Girin, L
Jutten, C
Schwartz, JL
机构
[1] Univ Grenoble 3, INPG, ICP, CNRS UMR 5009, F-38031 Grenoble 1, France
[2] Univ Grenoble 1, INPG, LIS, CNRS UMR 5083, F-38041 Grenoble, France
关键词
blind source separation; audio-visual coherence; speech enhancement; audio-visual joint probability; spectral information;
D O I
10.1016/j.specom.2004.10.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Looking at the speaker's face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor with reference to a target source. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:113 / 125
页数:13
相关论文
共 50 条
  • [31] Audio-Visual Fusion With Temporal Convolutional Attention Network for Speech Separation
    Liu, Debang
    Zhang, Tianqi
    Christensen, Mads Graesboll
    Yi, Chen
    An, Zeliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4647 - 4660
  • [32] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
    Li, Guinan
    Yu, Jianwei
    Deng, Jiajun
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
  • [33] VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency
    Gao, Ruohan
    Grauman, Kristen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15490 - 15500
  • [34] Audio-visual speech perception is special
    Tuomainen, J
    Andersen, TS
    Tiippana, K
    Sams, M
    COGNITION, 2005, 96 (01) : B13 - B22
  • [35] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [36] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [37] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [38] Audio-Visual Speech Cue Combination
    Arnold, Derek H.
    Tear, Morgan
    Schindel, Ryan
    Roseboom, Warrick
    PLOS ONE, 2010, 5 (04):
  • [39] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [40] Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
    Deligne, S
    Potamianos, G
    Neti, C
    SAM2002: IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP PROCEEDINGS, 2002, : 68 - 71