Robust Face Frontalization For Visual Speech Recognition

被引：3

作者：

Kang, Zhiqi ^{[1
,2
]}

Horaud, Radu ^{[1
,2
]}

Sadeghi, Mostafa ^{[3
]}

机构：

[1] Inria, Montbonnot St Martin, France

[2] Univ Grenoble Alpes, Montbonnot St Martin, France

[3] Inria Nancy Grand Est, Nancy, France

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年

关键词：

CLOSED-FORM SOLUTION;

D O I：

10.1109/ICCVW54120.2021.00281

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a robust frontalization method that preserves non-rigid facial deformations, i.e. expressions, to perform lip reading. The method iteratively estimates the rigid transformation (scale, rotation, and translation) and the non-rigid deformation between 3D landmarks extracted from an arbitrarily-viewed face, and 3D vertices parameterized by a deformable shape model. An important merit of the method is its ability to deal with large Gaussian and non-Gaussian errors in the data. For that purpose, we use the generalized Student-t distribution. The associated EM algorithm assigns a weight to each observed landmark, the higher the weight the more important the landmark, thus favoring landmarks that are only affected by rigid head movements. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability to preserve facial expressions. We show that the method, when incorporated into a deep lip-reading pipeline, considerably improves the word classification score on an in-the-wild benchmark.

引用

页码：2485 / 2495

页数：11

共 50 条

[31] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
Huang, Jing
Kingsbury, Brian
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
[32] Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition
Javier R. Movellan
Paul Mineiro
Machine Learning, 1998, 32 : 85 - 100
[33] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
Liu, Hong
Li, Wenhao
Yang, Bing
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
[34] Robust sensor fusion: Analysis and application to audio visual speech recognition
Movellan, JR
Mineiro, P
MACHINE LEARNING, 1998, 32 (02) : 85 - 100
[35] Robust Self-Supervised Audio-Visual Speech Recognition
Shi, Bowen
Hsu, Wei-Ning
Mohamed, Abdelrahman
INTERSPEECH 2022, 2022, : 2118 - 2122
[36] Integrating audio and visual information to provide highly robust speech recognition
Tomlinson, MJ
Russell, MJ
Brooke, NM
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 821 - 824
[37] A ROBUST AND REAL-TIME VISUAL SPEECH RECOGNITION FOR SMARTPHONE APPLICATION
Song, Min Gyu
Tariquzzamani, Md
Kim, Jin Young
Hwang, Seong Taek
Chi, Seung Ho
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (04): : 2837 - 2853
[38] Early visual deprivation affects the development of face recognition and of audio-visual speech perception
Putzar, Lisa
Hoetting, Kirsten
Roeder, Brigitte
RESTORATIVE NEUROLOGY AND NEUROSCIENCE, 2010, 28 (02) : 251 - 257
[39] A robust speech analysis in speech recognition
Miyanaga, Y
Gozen, S
Ohtsuki, N
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
[40] Video-Based Emotion Recognition using Face Frontalization and Deep Spatiotemporal Feature
Wang, Jinwei
Zhao, Ziping
Liang, Jinglian
Li, Chao
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,

← 1 2 3 4 5 →