ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT

被引:2
|
作者
Aldeneh, Zakaria [1 ]
Kumar, Anushree Prasanna [1 ]
Theobald, Barry-John [1 ]
Marchi, Erik [1 ]
Kajarekar, Sachin [1 ]
Naik, Devang [1 ]
Abdelaziz, Ahmed Hussen [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
关键词
audiovisual speech enhancement; lip reading; viseme classification; self-supervised learning;
D O I
10.1109/ICASSP39728.2021.9414263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of articulation. One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual embeddings for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.
引用
收藏
页码:8423 / 8427
页数:5
相关论文
共 50 条
  • [21] Effects of distance on visual and audiovisual speech recognition
    Jordan, TR
    Sergeant, P
    LANGUAGE AND SPEECH, 2000, 43 : 107 - 124
  • [22] Visual Speech Enhancement
    Gabbay, Aviv
    Shamir, Asaph
    Peleg, Shmuel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1170 - 1174
  • [23] Visual Hearing Aids: Artificial Visual Speech Stimuli for Audiovisual Speech Perception in Noise
    Choudhary, Zubin Datta
    Bruder, Gerd
    Welch, Gregory F.
    29TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY, VRST 2023, 2023,
  • [24] Speech segmentation is facilitated by visual cues
    Cunillera, Toni
    Camara, Estela
    Laine, Matti
    Rodriguez-Fornells, Antoni
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2010, 63 (02): : 260 - 274
  • [25] Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches Between Auditory and Visual Speech Cues
    Altvater-Mackensen, Nicole
    Mani, Nivedita
    Grossmann, Tobias
    DEVELOPMENTAL PSYCHOLOGY, 2016, 52 (02) : 191 - 204
  • [26] Modulation of perception by visual, auditory and audiovisual reward predicting cues
    Antono, Jessica Emily
    Pooresmaeili, Arezoo
    PERCEPTION, 2022, 51 : 54 - 54
  • [27] Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation
    Banks, Briony
    Gowen, Emma
    Munro, Kevin J.
    Adank, Patti
    FRONTIERS IN HUMAN NEUROSCIENCE, 2015, 9
  • [28] Configural vs motion processing in audiovisual enhancement of speech
    Jaekl, P.
    Alsius, A.
    Pesquita, A.
    Munhall, K.
    Soto-Faraco, S.
    PERCEPTION, 2011, 40 : 15 - 15
  • [29] Effects of noise and audiovisual cues on speech processing in adults with and without ADHD
    Michalek, Anne M. P.
    Watson, Silvana M.
    Ash, Ivan
    Ringleb, Stacie
    Raymer, Anastasia
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2014, 53 (03) : 145 - 152
  • [30] Mismatch Negativity with Visual-only and Audiovisual Speech
    Curtis W. Ponton
    Lynne E. Bernstein
    Edward T. Auer
    Brain Topography, 2009, 21 : 207 - 215