Audio-visual enhancement of speech in noise

被引：76

作者：

Girin, L ^{[1
]}

Schwartz, JL ^{[1
]}

Feng, G ^{[1
]}

机构：

[1] Univ Grenoble 3, CNRS UMR 5009, Inst Commun Parlee, INPG, F-38040 Grenoble, France

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2001年 / 109卷 / 06期

关键词：

D O I：

10.1121/1.1358887

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of Lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise. (C) 2001 Acoustical Society of America.

引用

页码：3007 / 3020

页数：14

共 50 条

[21] Speech enhancement and recognition in meetings with an audio-visual sensor array
Maganti, Hari Krishna
Gatica-Perez, Daniel
McCowan, Iain
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
[22] Audio-visual speech comprehension in noise with real and virtual speakers
Nirme, Jens
Sahlen, Birgitta
Ahlander, Viveka Lyberg
Brannstrom, Jonas
Haake, Magnus
[J]. SPEECH COMMUNICATION, 2020, 116 : 44 - 55
[23] THE IMPACT OF REMOVING HEAD MOVEMENTS ON AUDIO-VISUAL SPEECH ENHANCEMENT
Kang, Zhiqi
Sadeghi, Mostafa
Horaud, Radu
Alameda-Pineda, Xavier
Donley, Jacob
Kumar, Anurag
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7302 - 7306
[24] Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Zheng, Rui-Chen
Ai, Yang
Ling, Zhen-Hua
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1430 - 1444
[25] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
Martin Heckmann
Frédéric Berthommier
Kristian Kroschel
[J]. EURASIP Journal on Advances in Signal Processing, 2002
[26] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
Huang, Jing
Kingsbury, Brian
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
[27] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
[J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[28] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
Abdelaziz, Ahmed Hussen
Zeiler, Steffen
Kolossa, Dorothea
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
[29] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
[J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
[30] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
Huyse, Aurelie
Leybaert, Jacqueline
Berthommier, Frederic
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931

← 1 2 3 4 5 →