Audio-visual enhancement of speech in noise

被引:76
|
作者
Girin, L [1 ]
Schwartz, JL [1 ]
Feng, G [1 ]
机构
[1] Univ Grenoble 3, CNRS UMR 5009, Inst Commun Parlee, INPG, F-38040 Grenoble, France
来源
关键词
D O I
10.1121/1.1358887
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach-that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of Lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise. (C) 2001 Acoustical Society of America.
引用
收藏
页码:3007 / 3020
页数:14
相关论文
共 50 条
  • [1] Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Tsao, Yu
    Lo, Chen-Chou
    Wang, Hsin-Min
    [J]. INTERSPEECH 2020, 2020, : 1131 - 1135
  • [2] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
    Yang, Karren
    Markovic, Dejan
    Krenn, Steven
    Agrawal, Vasu
    Richard, Alexander
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
  • [3] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [4] Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
    Deligne, S
    Potamianos, G
    Neti, C
    [J]. SAM2002: IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP PROCEEDINGS, 2002, : 68 - 71
  • [5] Audio-visual speech in noise perception in dyslexia
    van Laarhoven, Thijs
    Keetels, Mirjam
    Schakel, Lemmy
    Vroomen, Jean
    [J]. DEVELOPMENTAL SCIENCE, 2018, 21 (01)
  • [6] Improved Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Wang, Hsin-Min
    Tsao, Yu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1345 - 1359
  • [7] A ROBUST AUDIO-VISUAL SPEECH ENHANCEMENT MODEL
    Wang, Wupeng
    Xing, Chao
    Wang, Dong
    Chen, Xiao
    Sun, Fengyu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7529 - 7533
  • [8] Improved Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Wang, Hsin-Min
    Tsao, Yu
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2022, 30 : 1345 - 1359
  • [9] Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition
    Chen, Hang
    Wang, Qing
    Du, Jun
    Yin, Bao-Cai
    Pan, Jia
    Lee, Chin-Hui
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 2508 - 2521
  • [10] AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE
    Blanco, Andrea Lorena Aldana
    Valentini-Botinhao, Cassia
    Klejch, Ondrej
    Gogate, Mandar
    Dashtipour, Kia
    Hussain, Amir
    Bell, Peter
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 465 - 471