ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT

被引:2
|
作者
Aldeneh, Zakaria [1 ]
Kumar, Anushree Prasanna [1 ]
Theobald, Barry-John [1 ]
Marchi, Erik [1 ]
Kajarekar, Sachin [1 ]
Naik, Devang [1 ]
Abdelaziz, Ahmed Hussen [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
关键词
audiovisual speech enhancement; lip reading; viseme classification; self-supervised learning;
D O I
10.1109/ICASSP39728.2021.9414263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of articulation. One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual embeddings for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.
引用
收藏
页码:8423 / 8427
页数:5
相关论文
共 50 条
  • [1] The contribution of dynamic visual cues to audiovisual speech perception
    Jaekl, Philip
    Pesquita, Ana
    Alsius, Agnes
    Munhall, Kevin
    Soto-Faraco, Salvador
    [J]. NEUROPSYCHOLOGIA, 2015, 75 : 402 - 410
  • [2] The role of visual spatial attention in audiovisual speech perception
    Andersen, Tobias S.
    Tiippana, Kaisa
    Laarni, Jari
    Kojo, Ilpo
    Sams, Mikko
    [J]. SPEECH COMMUNICATION, 2009, 51 (02) : 184 - 193
  • [3] The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech
    Theze, Raphael
    Giraud, Anne-Lise
    Megevand, Pierre
    [J]. SCIENCE ADVANCES, 2020, 6 (45)
  • [4] The role of facial colour and luminance in visual and audiovisual speech perception
    McCotter, MV
    Jordan, TR
    [J]. PERCEPTION, 2003, 32 (08) : 921 - 936
  • [5] Speech Cues Contribute to Audiovisual Spatial Integration
    Bishop, Christopher W.
    Miller, Lee M.
    [J]. PLOS ONE, 2011, 6 (08):
  • [6] Segmenting Speech by Mouth: The Role of Oral Prosodic Cues for Visual Speech Segmentation
    Mitchel, Aaron D.
    Lusk, Laina G.
    Wellington, Ian
    Mook, Alexis T.
    [J]. LANGUAGE AND SPEECH, 2023, 66 (04) : 819 - 832
  • [7] Audiovisual Speech Enhancement via Cross-Modal Suppression of Auditory Association Cortex by Visual Speech
    Karas, Patrick J.
    Magnotti, John F.
    Wang, Zhengjia
    Metzger, Brian A.
    Yoshor, Daniel
    Beauchamp, Michael S.
    [J]. NEUROSURGERY, 2019, 66 : 156 - 157
  • [8] RATINGS OF STUTTERING BY AUDIO, VISUAL, AND AUDIOVISUAL CUES
    WILLIAMS, DE
    WARK, M
    MINIFIE, FD
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1963, 6 (01): : 91 - 100
  • [9] Gaze Patterns and Audiovisual Speech Enhancement
    Yi, Astrid
    Wong, Willy
    Eizenman, Moshe
    [J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2013, 56 (02): : 471 - 480
  • [10] The role of visual speech cues in reducing energetic and informational masking
    Helfer, KS
    Freyman, RL
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (02): : 842 - 849