Robust Face Frontalization For Visual Speech Recognition

被引:3
|
作者
Kang, Zhiqi [1 ,2 ]
Horaud, Radu [1 ,2 ]
Sadeghi, Mostafa [3 ]
机构
[1] Inria, Montbonnot St Martin, France
[2] Univ Grenoble Alpes, Montbonnot St Martin, France
[3] Inria Nancy Grand Est, Nancy, France
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年
关键词
CLOSED-FORM SOLUTION;
D O I
10.1109/ICCVW54120.2021.00281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a robust frontalization method that preserves non-rigid facial deformations, i.e. expressions, to perform lip reading. The method iteratively estimates the rigid transformation (scale, rotation, and translation) and the non-rigid deformation between 3D landmarks extracted from an arbitrarily-viewed face, and 3D vertices parameterized by a deformable shape model. An important merit of the method is its ability to deal with large Gaussian and non-Gaussian errors in the data. For that purpose, we use the generalized Student-t distribution. The associated EM algorithm assigns a weight to each observed landmark, the higher the weight the more important the landmark, thus favoring landmarks that are only affected by rigid head movements. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability to preserve facial expressions. We show that the method, when incorporated into a deep lip-reading pipeline, considerably improves the word classification score on an in-the-wild benchmark.
引用
收藏
页码:2485 / 2495
页数:11
相关论文
共 50 条
  • [31] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
    Huang, Jing
    Kingsbury, Brian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
  • [32] Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition
    Javier R. Movellan
    Paul Mineiro
    Machine Learning, 1998, 32 : 85 - 100
  • [33] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [34] Robust sensor fusion: Analysis and application to audio visual speech recognition
    Movellan, JR
    Mineiro, P
    MACHINE LEARNING, 1998, 32 (02) : 85 - 100
  • [35] Robust Self-Supervised Audio-Visual Speech Recognition
    Shi, Bowen
    Hsu, Wei-Ning
    Mohamed, Abdelrahman
    INTERSPEECH 2022, 2022, : 2118 - 2122
  • [36] Integrating audio and visual information to provide highly robust speech recognition
    Tomlinson, MJ
    Russell, MJ
    Brooke, NM
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 821 - 824
  • [37] A ROBUST AND REAL-TIME VISUAL SPEECH RECOGNITION FOR SMARTPHONE APPLICATION
    Song, Min Gyu
    Tariquzzamani, Md
    Kim, Jin Young
    Hwang, Seong Taek
    Chi, Seung Ho
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (04): : 2837 - 2853
  • [38] Early visual deprivation affects the development of face recognition and of audio-visual speech perception
    Putzar, Lisa
    Hoetting, Kirsten
    Roeder, Brigitte
    RESTORATIVE NEUROLOGY AND NEUROSCIENCE, 2010, 28 (02) : 251 - 257
  • [39] A robust speech analysis in speech recognition
    Miyanaga, Y
    Gozen, S
    Ohtsuki, N
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
  • [40] Video-Based Emotion Recognition using Face Frontalization and Deep Spatiotemporal Feature
    Wang, Jinwei
    Zhao, Ziping
    Liang, Jinglian
    Li, Chao
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,