Developing an audio-visual speech source separation algorithm

被引：24

作者：

Sodoyer, D

Girin, L

Jutten, C

Schwartz, JL

机构：

[1] Univ Grenoble 3, INPG, ICP, CNRS UMR 5009, F-38031 Grenoble 1, France

[2] Univ Grenoble 1, INPG, LIS, CNRS UMR 5083, F-38041 Grenoble, France

来源：

SPEECH COMMUNICATION | 2004年 / 44卷 / 1-4期

关键词：

blind source separation; audio-visual coherence; speech enhancement; audio-visual joint probability; spectral information;

D O I：

10.1016/j.specom.2004.10.002

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Looking at the speaker's face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor with reference to a target source. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：113 / 125

页数：13

共 50 条

[1] Listen and Look: Audio-Visual Matching Assisted Speech Source Separation
Lu, Rui
Duan, Zhiyao
Zhang, Changshui
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (09) : 1315 - 1319
[2] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
Sodoyer, D
Schwartz, JL
Girin, L
Klinkisch, J
Jutten, C
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1165 - 1173
[3] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
[4] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
David Sodoyer
Jean-Luc Schwartz
Laurent Girin
Jacob Klinkisch
Christian Jutten
EURASIP Journal on Advances in Signal Processing, 2002
[5] Audio-Visual Deep Clustering for Speech Separation
Lu, Rui
Duan, Zhiyao
Zhang, Changshui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1697 - 1712
[6] Bayesian separation of audio-visual speech sources
Rajaram, S
Nefian, AV
Huang, TS
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 657 - 660
[7] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
Li, Chenda
Qian, Yanmin
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
[8] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[9] Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training
Zhang, Peng
Xu, Jiaming
Shi, Jing
Hao, Yunzhe
Qin, Lei
Xu, Bo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[10] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):

← 1 2 3 4 5 →