Audio-visual enhancement of speech in noise

被引:76
|
作者
Girin, L [1 ]
Schwartz, JL [1 ]
Feng, G [1 ]
机构
[1] Univ Grenoble 3, CNRS UMR 5009, Inst Commun Parlee, INPG, F-38040 Grenoble, France
来源
关键词
D O I
10.1121/1.1358887
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of Lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise. (C) 2001 Acoustical Society of America.
引用
收藏
页码:3007 / 3020
页数:14
相关论文
共 50 条
  • [21] Speech enhancement and recognition in meetings with an audio-visual sensor array
    Maganti, Hari Krishna
    Gatica-Perez, Daniel
    McCowan, Iain
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2257 - 2269
  • [22] Audio-visual speech comprehension in noise with real and virtual speakers
    Nirme, Jens
    Sahlen, Birgitta
    Ahlander, Viveka Lyberg
    Brannstrom, Jonas
    Haake, Magnus
    [J]. SPEECH COMMUNICATION, 2020, 116 : 44 - 55
  • [23] THE IMPACT OF REMOVING HEAD MOVEMENTS ON AUDIO-VISUAL SPEECH ENHANCEMENT
    Kang, Zhiqi
    Sadeghi, Mostafa
    Horaud, Radu
    Alameda-Pineda, Xavier
    Donley, Jacob
    Kumar, Anurag
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7302 - 7306
  • [24] Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
    Zheng, Rui-Chen
    Ai, Yang
    Ling, Zhen-Hua
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1430 - 1444
  • [25] Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
    Martin Heckmann
    Frédéric Berthommier
    Kristian Kroschel
    [J]. EURASIP Journal on Advances in Signal Processing, 2002
  • [26] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
    Huang, Jing
    Kingsbury, Brian
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
  • [27] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [28] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
  • [29] Expressive audio-visual speech
    Bevacqua, E
    Pelachaud, C
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
  • [30] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
    Huyse, Aurelie
    Leybaert, Jacqueline
    Berthommier, Frederic
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931