Reconstructing intelligible audio speech from visual speech features

被引:0
|
作者
Le Cornu, Thomas [1 ]
Ben Milner [1 ]
机构
[1] Univ East Anglia, Norwich NR4 7TJ, Norfolk, England
关键词
speech intelligibility; visual speech; GMMs; DNNs; STRAIGHT; MODEL;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.
引用
收藏
页码:3355 / 3359
页数:5
相关论文
共 50 条
  • [11] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [12] Integration of Deep Bottleneck Features for Audio-Visual Speech Recognition
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Tamura, Satoshi
    Iribe, Yurie
    Takeda, Kazuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 563 - 567
  • [13] Depth-based Features in Audio-Visual Speech Recognition
    Palecek, Karel
    Chaloupka, Josef
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 303 - 306
  • [14] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
    Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
  • [15] Analysis of lip geometric features for audio-visual speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Han, Z
    Chung, KC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570
  • [16] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
    Alm, Magnus
    Behne, Dawn
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
  • [17] Audio-visual speech perception without speech cues
    Saldana, HM
    Pisoni, DB
    Fellowes, JM
    Remez, RE
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
  • [18] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [19] Reconstructing neutral speech from tracheoesophageal speech
    Reddy, Abinay N.
    Rao, Achuth M., V
    Meenakshi, G. Nisha
    Ghosh, Prasanta Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1541 - 1545
  • [20] Expressive audio-visual speech
    Bevacqua, E
    Pelachaud, C
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304