ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT

被引:2
|
作者
Aldeneh, Zakaria [1 ]
Kumar, Anushree Prasanna [1 ]
Theobald, Barry-John [1 ]
Marchi, Erik [1 ]
Kajarekar, Sachin [1 ]
Naik, Devang [1 ]
Abdelaziz, Ahmed Hussen [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
关键词
audiovisual speech enhancement; lip reading; viseme classification; self-supervised learning;
D O I
10.1109/ICASSP39728.2021.9414263
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show that visual cues provide not only high-level information about speech activity, i.e., speech/silence, but also fine-grained visual information about the place of articulation. One byproduct of this finding is that the learned visual embeddings can be used as features for other visual speech applications. We demonstrate the effectiveness of the learned visual embeddings for classifying visemes (the visual analogy to phonemes). Our results provide insight into important aspects of audiovisual speech enhancement and demonstrate how such models can be used for self-supervision tasks for visual speech applications.
引用
收藏
页码:8423 / 8427
页数:5
相关论文
共 50 条
  • [41] DESIGNING FOR VISUAL CUES THAT ENHANCE SPEECH RECEPTION
    BOYER, LL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S11 - S11
  • [42] Effects of Visual Speech Envelope on Audiovisual Speech Perception in Multitalker Listening Environments
    Yuan, Yi
    Meyers, Kelli
    Borges, Kayla
    Lleo, Yasneli
    Fiorentino, Katarina A.
    Oh, Yonghee
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2021, 64 (07): : 2845 - 2853
  • [43] The Role of Auditory and Visual Cues in the Perception of Mandarin Emotional Speech in Male Drug Addicts
    Geng, Puyang
    Fan, Ningxue
    Ling, Rong
    Guo, Hong
    Lu, Qimeng
    Chen, Xingwen
    SPEECH COMMUNICATION, 2023, 155
  • [44] Assessing the role of attention in the audiovisual integration of speech
    Navarra, Jordi
    Alsius, Agnes
    Soto-Faraco, Salvador
    Spence, Charles
    INFORMATION FUSION, 2010, 11 (01) : 4 - 11
  • [45] The influence of selective attention to auditory and visual speech on the integration of audiovisual speech information
    Buchan, Julie N.
    Munhall, Kevin G.
    PERCEPTION, 2011, 40 (10) : 1164 - 1182
  • [46] Recognition of Accented Speech by Cochlear-Implant Listeners: Benefit of Audiovisual Cues
    Waddington, Emily
    Jaekel, Brittany N.
    Tinnemore, Anna R.
    Gordon-Salant, Sandra
    Goupell, Matthew J.
    EAR AND HEARING, 2020, 41 (05): : 1236 - 1250
  • [47] Enhancement of Visual Perception with Use of Dynamic Cues
    Andia, Marcelo E.
    Plett, Johannes
    Tejos, Cristian
    Guarini, Marcelo W.
    Navarro, Maria E.
    Razmilic, Dravna
    Meneses, Luis
    Villalon, Manuel J.
    Irarrazaval, Pablo
    RADIOLOGY, 2009, 250 (02) : 551 - 557
  • [48] Speech identification in noise: Contribution of temporal, spectral, and visual speech cues
    Kim, Jeesun
    Davis, Chris
    Groot, Christopher
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 126 (06): : 3246 - 3257
  • [49] The effect of visual speech timing and form cues on the processing of speech and nonspeech
    Davis, Chris
    Kim, Jeesun
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1638 - 1641
  • [50] Developmental Shifts in Detection and Attention for Auditory, Visual and Audiovisual Speech
    Jerger, Susan
    Damian, Markus F.
    Karl, Cassandra
    Abdi, Herve
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (12): : 3095 - 3112