Reconstructing intelligible audio speech from visual speech features

被引：0

作者：

Le Cornu, Thomas ^{[1
]}

Ben Milner ^{[1
]}

机构：

[1] Univ East Anglia, Norwich NR4 7TJ, Norfolk, England

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

speech intelligibility; visual speech; GMMs; DNNs; STRAIGHT; MODEL;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.

引用

页码：3355 / 3359

页数：5

共 50 条

[41] Comparing audio and visual information for speech processing
Dean, D
Lucey, P
Sridharan, S
Wark, T
ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 58 - 61
[42] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[43] The Conversation: Deep Audio -Visual Speech Enhancement
Afouras, Triantafyllos
Chung, Joon Son
Zisserman, Andrew
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3244 - 3248
[44] TIME DOMAIN AUDIO VISUAL SPEECH SEPARATION
Wu, Jian
Xu, Yong
Zhang, Shi-Xiong
Chen, Lian-Wu
Yu, Meng
Xie, Lei
Yu, Dong
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 667 - 673
[45] Audio-visual integration for speech recognition
Kober, R
Harz, U
NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[46] Audio-Visual Speech Cue Combination
Arnold, Derek H.
Tear, Morgan
Schindel, Ryan
Roseboom, Warrick
PLOS ONE, 2010, 5 (04):
[47] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[48] Audio-visual speech recognition by speechreading
Zhang, XZ
Mersereau, RM
Clements, MA
DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
[49] THE EFFECT OF SPEAKING RATE ON AUDIO AND VISUAL SPEECH
Taylor, Sarah
Theobald, Barry-John
Matthews, Iain
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[50] Special Issue on audio visual speech processing
Schwartz, JL
Berthommier, F
Cathiard, MA
de Mori, R
SPEECH COMMUNICATION, 2004, 44 (1-4) : 1 - 3

← 1 2 3 4 5 →