Reconstructing intelligible audio speech from visual speech features

被引：0

作者：

Le Cornu, Thomas ^{[1
]}

Ben Milner ^{[1
]}

机构：

[1] Univ East Anglia, Norwich NR4 7TJ, Norfolk, England

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

speech intelligibility; visual speech; GMMs; DNNs; STRAIGHT; MODEL;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech features. The proposed method aims to estimate a spectral envelope from visual features which is then combined with an artificial excitation signal and used within a model of speech production to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio.

引用

页码：3355 / 3359

页数：5

共 50 条

[11] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[12] Integration of Deep Bottleneck Features for Audio-Visual Speech Recognition
Ninomiya, Hiroshi
Kitaoka, Norihide
Tamura, Satoshi
Iribe, Yurie
Takeda, Kazuya
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 563 - 567
[13] Depth-based Features in Audio-Visual Speech Recognition
Palecek, Karel
Chaloupka, Josef
2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 303 - 306
[14] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
[15] Analysis of lip geometric features for audio-visual speech recognition
Kaynak, MN
Zhi, Q
Cheok, AD
Sengupta, K
Han, Z
Chung, KC
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570
[16] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, Magnus
Behne, Dawn
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
[17] Audio-visual speech perception without speech cues
Saldana, HM
Pisoni, DB
Fellowes, JM
Remez, RE
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2187 - 2190
[18] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[19] Reconstructing neutral speech from tracheoesophageal speech
Reddy, Abinay N.
Rao, Achuth M., V
Meenakshi, G. Nisha
Ghosh, Prasanta Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1541 - 1545
[20] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304

← 1 2 3 4 5 →