Audio-Visual Speech Asynchrony Modeling in a Talking Head

被引:0
|
作者
Karpov, Alexey [1 ]
Tsirulnik, Liliya [2 ]
Krnoul, Zdenek [3 ]
Ronzhin, Andrey [1 ]
Lobanov, Boris [2 ]
Zelezny, Milos [3 ]
机构
[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, Moscow, Russia
[2] Natl Acad Sci, United Inst Informat Problems, Minsk, BELARUS
[3] Univ West Bohemia, Plzen, Czech Republic
基金
俄罗斯基础研究基金会;
关键词
audio-visual speech processing; text-to-speech synthesis; multimodal speech perception; cognitive study;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An audio-visual speech synthesis system with modeling of asynchrony between auditory and visual speech modalities is proposed in the paper. Corpus-based study of real recordings gave us the required data for understanding the problem of modalities asynchrony that is partially caused by the co-articulation phenomena. A set of context-dependent timing rules and recommendations was elaborated in order to make a synchronization of auditory and visual speech cues of the animated talking head similar to a natural humanlike way. The cognitive evaluation of the model-based talking head for Russian with implementation of the original asynchrony model has shown high intelligibility and naturalness of audio-visual synthesized speech.
引用
收藏
页码:2883 / +
页数:2
相关论文
共 50 条
  • [1] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
    [J]. Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
  • [2] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
    Alm, Magnus
    Behne, Dawn
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
  • [3] Multi-stream asynchrony modeling for audio-visual speech recognition
    Lv, Guoyun
    Jiang, Dongmei
    Zhao, Rongchun
    Hou, Yunshu
    [J]. ISM 2007: NINTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2007, : 37 - 44
  • [4] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [5] Expressive Talking Head Generation with Granular Audio-Visual Control
    Liang, Borong
    Pan, Yan
    Guo, Zhizhi
    Zhou, Hang
    Hong, Zhibin
    Han, Xiaoguang
    Han, Junyu
    Liu, Jingtuo
    Ding, Errui
    Wang, Jingdong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3377 - 3386
  • [6] Audio-visual modeling for bimodal speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Chung, KC
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
  • [7] Audio-visual speech perception off the top of the head
    Davis, Chris
    Kim, Jeesun
    [J]. COGNITION, 2006, 100 (03) : B21 - B31
  • [8] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
  • [9] IMPROVING ACOUSTIC MODELING USING AUDIO-VISUAL SPEECH
    Abdelaziz, Ahmed Hussen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1081 - 1086
  • [10] Audio-visual talking face detection
    Li, MK
    Li, DG
    Dimitrova, N
    Sethi, I
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 473 - 476