Reconstructing intelligible audio speech from visual speech features

被引:0
|
作者
Le Cornu, Thomas [1 ]
Ben Milner [1 ]
机构
[1] Univ East Anglia, Norwich NR4 7TJ, Norfolk, England
关键词
speech intelligibility; visual speech; GMMs; DNNs; STRAIGHT; MODEL;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.
引用
收藏
页码:3355 / 3359
页数:5
相关论文
共 50 条
  • [41] Comparing audio and visual information for speech processing
    Dean, D
    Lucey, P
    Sridharan, S
    Wark, T
    ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 58 - 61
  • [42] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [43] The Conversation: Deep Audio -Visual Speech Enhancement
    Afouras, Triantafyllos
    Chung, Joon Son
    Zisserman, Andrew
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
  • [44] TIME DOMAIN AUDIO VISUAL SPEECH SEPARATION
    Wu, Jian
    Xu, Yong
    Zhang, Shi-Xiong
    Chen, Lian-Wu
    Yu, Meng
    Xie, Lei
    Yu, Dong
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 667 - 673
  • [45] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [46] Audio-Visual Speech Cue Combination
    Arnold, Derek H.
    Tear, Morgan
    Schindel, Ryan
    Roseboom, Warrick
    PLOS ONE, 2010, 5 (04):
  • [47] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
  • [48] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [49] THE EFFECT OF SPEAKING RATE ON AUDIO AND VISUAL SPEECH
    Taylor, Sarah
    Theobald, Barry-John
    Matthews, Iain
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [50] Special Issue on audio visual speech processing
    Schwartz, JL
    Berthommier, F
    Cathiard, MA
    de Mori, R
    SPEECH COMMUNICATION, 2004, 44 (1-4) : 1 - 3