Reconstructing intelligible audio speech from visual speech features

被引:0
|
作者
Le Cornu, Thomas [1 ]
Ben Milner [1 ]
机构
[1] Univ East Anglia, Norwich NR4 7TJ, Norfolk, England
关键词
speech intelligibility; visual speech; GMMs; DNNs; STRAIGHT; MODEL;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.
引用
收藏
页码:3355 / 3359
页数:5
相关论文
共 50 条
  • [1] Generating Intelligible Audio Speech From Visual Speech
    Le Cornu, Thomas
    Milner, Ben
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (09) : 1447 - 1457
  • [2] Enhancing Audio Speech using Visual Speech Features
    Almajai, Ibrahim
    Milner, Ben
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1915 - 1918
  • [3] Fusing audio and visual features of speech
    Pan, H
    Liang, ZP
    Huang, TS
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 214 - 217
  • [4] Towards reconstructing intelligible speech from the human auditory cortex
    Hassan Akbari
    Bahar Khalighinejad
    Jose L. Herrero
    Ashesh D. Mehta
    Nima Mesgarani
    Scientific Reports, 9
  • [5] Towards reconstructing intelligible speech from the human auditory cortex
    Akbari, Hassan
    Khalighinejad, Bahar
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [6] A new approach to integrate audio and visual features of speech
    Pan, H
    Liang, ZP
    Huang, TS
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1093 - 1096
  • [7] CONTINUOUS VISUAL SPEECH RECOGNITION FOR AUDIO SPEECH ENHANCEMENT
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2244 - 2248
  • [8] Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training
    Zhang, Peng
    Xu, Jiaming
    Shi, Jing
    Hao, Yunzhe
    Qin, Lei
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Audio-visual speech recognition using MPEGA compliant visual features
    Aleksic, PS
    Williams, JJ
    Wu, ZL
    Katsaggelos, AK
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1213 - 1227
  • [10] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
    Choi, Jeongsoo
    Park, Se Jin
    Kim, Minsu
    Ro, Yong Man
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27315 - 27327