Audio-visual speech translation with automatic LIP synchronization and face tracking based on 3-D head model

被引:0
|
作者
Morishima, Shigeo [1 ]
Ogata, Shin [1 ]
Murai, Kazumasa [1 ]
Nakamura, Satoshi [1 ]
机构
[1] ATR Spoken Language Translation Res., 2-2-2 Hikaridai Seika-cho, Soraku-gun Kyoto, 619-0288, Japan
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Algorithms - Computer simulation - Face recognition - Interpolation - Speech recognition - Synchronization - Three dimensional computer graphics - Translation (languages)
引用
收藏
相关论文
共 50 条
  • [1] Audio-visual speech translation with automatic lip synchronization and face tracking based on 3-D read model
    Morishima, S
    Ogata, S
    Murai, K
    Nakamura, S
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2117 - 2120
  • [2] On the Audio-visual Synchronization for Lip-to-Speech Synthesis
    Niu, Zhe
    Mak, Brian
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 7809 - 7818
  • [3] Realtime lip contour tracking for audio-visual speech recognition applications
    Yazdi, Mehran
    Seyfi, Mehdi
    Rafati, Amirhossein
    Asadi, Meghdad
    [J]. World Academy of Science, Engineering and Technology, 2009, 40 : 164 - 167
  • [4] Lip Tracking Method for the System of Audio-Visual Polish Speech Recognition
    Kubanek, Mariusz
    Bobulski, Janusz
    Adrjanowicz, Lukasz
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2012, 7267 : 535 - 542
  • [5] THE USE OF DYNAMIC DEFORMABLE TEMPLATES FOR LIP TRACKING IN AN AUDIO-VISUAL CORPUS WITH LARGE VARIATIONS IN HEAD POSE, FACE ILLUMINATION AND LIP SHAPES
    Wu, Zhiyong
    Wu, Jiying
    Meng, Helen M.
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 370 - 373
  • [6] A Facial Feature and Lip Movement Enhanced Audio-Visual Speech Separation Model
    Li, Guizhu
    Fu, Min
    Sun, Mengnan
    Liu, Xuefeng
    Zheng, Bing
    [J]. SENSORS, 2023, 23 (21)
  • [7] A 3-D Audio-Visual Corpus of Affective Communication
    Fanelli, Gabriele
    Gall, Juergen
    Romsdorfer, Harald
    Weise, Thibaut
    Van Gool, Luc
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (06) : 591 - 598
  • [8] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [9] Audio-visual speech recognition integrating 3D lip information obtained from the Kinect
    Wang, Jianrong
    Zhang, Ju
    Honda, Kiyoshi
    Wei, Jianguo
    Dang, Jianwu
    [J]. MULTIMEDIA SYSTEMS, 2016, 22 (03) : 315 - 323
  • [10] Audio-visual speech recognition integrating 3D lip information obtained from the Kinect
    Jianrong Wang
    Ju Zhang
    Kiyoshi Honda
    Jianguo Wei
    Jianwu Dang
    [J]. Multimedia Systems, 2016, 22 : 315 - 323