ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT

被引:2
|
作者
Aldeneh, Zakaria [1 ]
Kumar, Anushree Prasanna [1 ]
Theobald, Barry-John [1 ]
Marchi, Erik [1 ]
Kajarekar, Sachin [1 ]
Naik, Devang [1 ]
Abdelaziz, Ahmed Hussen [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
关键词
audiovisual speech enhancement; lip reading; viseme classification; self-supervised learning;
D O I
10.1109/ICASSP39728.2021.9414263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of articulation. One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual embeddings for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.
引用
收藏
页码:8423 / 8427
页数:5
相关论文
共 50 条
  • [31] COMPARATIVE ANALYSIS OF AUDIOVISUAL, AUDITIVE AND VISUAL PERCEPTION OF SPEECH
    EWERTSEN, HW
    NIELSEN, HB
    ACTA OTO-LARYNGOLOGICA, 1971, 72 (03) : 201 - &
  • [32] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
  • [33] Detection of Audiovisual Speech Correspondences Without Visual Awareness
    Alsius, Agnes
    Munhall, Kevin G.
    PSYCHOLOGICAL SCIENCE, 2013, 24 (04) : 423 - 431
  • [34] How visual cues to speech rate influence speech perception
    Bosker, Hans Rutger
    Peeters, David
    Holler, Judith
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2020, 73 (10): : 1523 - 1536
  • [35] On the robustness of audiovisual liveness detection to visual speech animation
    Komulainen, Jukka
    Anina, Iryna
    Holappa, Jukka
    Boutellaa, Elhocine
    Hadid, Abdenour
    2016 IEEE 8TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2016,
  • [36] An audiovisual test of kinematic primitives for visual speech perception
    Rosenblum, LD
    Saldana, HM
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1996, 22 (02) : 318 - 331
  • [37] Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech
    Garcia-Perez, Miguel A.
    Alcala-Quintana, Rocio
    I-PERCEPTION, 2015, 6 (06): : 1 - 20
  • [38] Mismatch Negativity with Visual-only and Audiovisual Speech
    Ponton, Curtis W.
    Bernstein, Lynne E.
    Auer, Edward T., Jr.
    BRAIN TOPOGRAPHY, 2009, 21 (3-4) : 207 - 215
  • [39] Distraction of visual attention reduces integration of audiovisual speech
    Tiippana, K.
    Sams, M.
    PERCEPTION, 2000, 29 : 22 - 22
  • [40] The role of audiovisual speech and orthographic information in nonnative speech production
    Erdener, VD
    Burnham, DK
    LANGUAGE LEARNING, 2005, 55 (02) : 191 - 228