Audio-Visual Speech Asynchrony Modeling in a Talking Head

被引：0

作者：

Karpov, Alexey ^{[1
]}

Tsirulnik, Liliya ^{[2
]}

Krnoul, Zdenek ^{[3
]}

Ronzhin, Andrey ^{[1
]}

Lobanov, Boris ^{[2
]}

Zelezny, Milos ^{[3
]}

机构：

[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, Moscow, Russia

[2] Natl Acad Sci, United Inst Informat Problems, Minsk, BELARUS

[3] Univ West Bohemia, Plzen, Czech Republic

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

基金：

俄罗斯基础研究基金会;

关键词：

audio-visual speech processing; text-to-speech synthesis; multimodal speech perception; cognitive study;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An audio-visual speech synthesis system with modeling of asynchrony between auditory and visual speech modalities is proposed in the paper. Corpus-based study of real recordings gave us the required data for understanding the problem of modalities asynchrony that is partially caused by the co-articulation phenomena. A set of context-dependent timing rules and recommendations was elaborated in order to make a synchronization of auditory and visual speech cues of the animated talking head similar to a natural humanlike way. The cognitive evaluation of the model-based talking head for Russian with implementation of the original asynchrony model has shown high intelligibility and naturalness of audio-visual synthesized speech.

引用

页码：2883 / +

页数：2

共 50 条

[1] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
[J]. Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
[2] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, Magnus
Behne, Dawn
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
[3] Multi-stream asynchrony modeling for audio-visual speech recognition
Lv, Guoyun
Jiang, Dongmei
Zhao, Rongchun
Hou, Yunshu
[J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
[4] Audio-Visual Speech Modeling for Continuous Speech Recognition
Dupont, Stephane
Luettin, Juergen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
[5] Expressive Talking Head Generation with Granular Audio-Visual Control
Liang, Borong
Pan, Yan
Guo, Zhizhi
Zhou, Hang
Hong, Zhibin
Han, Xiaoguang
Han, Junyu
Liu, Jingtuo
Ding, Errui
Wang, Jingdong
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3377 - 3386
[6] Audio-visual modeling for bimodal speech recognition
Kaynak, MN
Zhi, Q
Cheok, AD
Sengupta, K
Chung, KC
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
[7] Audio-visual speech perception off the top of the head
Davis, Chris
Kim, Jeesun
[J]. COGNITION, 2006, 100 (03) : B21 - B31
[8] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[9] IMPROVING ACOUSTIC MODELING USING AUDIO-VISUAL SPEECH
Abdelaziz, Ahmed Hussen
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1081 - 1086
[10] Audio-visual talking face detection
Li, MK
Li, DG
Dimitrova, N
Sethi, I
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 473 - 476

← 1 2 3 4 5 →