A spatio-temporal integrated model based on local and global features for video expression recognition

被引：7

作者：

Hu, Min ^{[1
,2
]}

Ge, Peng ^{[1
,2
]}

Wang, Xiaohua ^{[1
,2
]}

Lin, Hui ^{[3
]}

Ren, Fuji ^{[2
,4
]}

机构：

[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Peoples R China

[2] Hefei Univ Technol, Sch Comp & Informat, Anhui Prov Key Lab Affect Comp & Adv Intelligent, Hefei 230601, Peoples R China

[3] Hefei Univ Technol, Sch Elect Sci & Applicat Phys, Hefei 230601, Peoples R China

[4] Univ Tokushima, Grad Sch Adv Technol & Sci, Tokushima 7708502, Japan

来源：

VISUAL COMPUTER | 2022年 / 38卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Video expression recognition; Local and global features; Attention mechanism; Feature recalibration; Network integration; NETWORK; SCALE;

D O I：

10.1007/s00371-021-02136-z

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Facial expressions can be represented largely by the dynamic variations of important facial expression parts, i.e., eyebrows, eyes, nose, and mouth. The features of these parts are regarded as local features. However, facial global information is also useful for recognition because it is a necessary complement to local features. In this paper, a spatio-temporal integrated model that jointly learns local and global features is proposed for video expression recognition. Firstly, to capture the action of facial key units, a spatio-temporal attention part-gradient-based hierarchical bidirectional recurrent neural network (spatio-temporal attention PGHRNN) is constructed. It can capture the dynamic variations of gradients around facial landmark points. In addition, a new kind of spatial attention mechanism is introduced to recalibrate the features of facial various parts adaptively. Secondly, to complement the local features extracted by the spatio-temporal attention PGHRNN, a squeeze-and-excitation residual network of 50 layers with long short-term memory network (SE-ResNet-50-LSTM) is used as a global feature extractor and classifier. Finally, to integrate the local and global features and improve the performance of facial expression recognition, a joint adaptive fine-tuning method (JAFTM) is proposed to combine the two networks, which can adaptively adjust the network weights. Extensive experiments demonstrate that our proposed model can achieve a superior recognition accuracy of 98.95% on CK + for 7-class facial expressions and 85.40% on MMI database, which outperforms other state-of-the-art methods.

引用

页码：2617 / 2634

页数：18

共 50 条

[21] Action Recognition via an Improved Local Descriptor for Spatio-temporal Features
Yang, Kai
Du, Ji-Xiang
Zhai, Chuan-Min
[J]. ADVANCED INTELLIGENT COMPUTING, 2011, 6838 : 234 - 241
[22] Action recognition using spatio-temporal regularity based features
Goodhart, Taylor
Yan, Pingkun
Shah, Mubarak
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 745 - 748
[23] Spatio-temporal convolutional features with nested LSTM for facial expression recognition
Yu, Zhenbo
Liu, Guangcan
Liu, Qingshan
Deng, Jiankang
[J]. NEUROCOMPUTING, 2018, 317 : 50 - 57
[24] Action Recognition Based on Local Spatio-temporal Oriented Energy Features and Additive Kernel SVM
Cao Qingnian
Jiang Yuanyuan
[J]. 2014 FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND ENGINEERING APPLICATIONS (ISDEA), 2014, : 118 - 122
[25] Micro-Expression Recognition by Aggregating Local Spatio-Temporal Patterns
Zhang, Shiyu
Feng, Bailan
Chen, Zhineng
Huang, Xiangsheng
[J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 638 - 648
[26] Global-local spatio-temporal graph convolutional networks for video summarization
Wu, Guangli
Song, Shanshan
Zhang, Jing
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
[27] Video Action Recognition Based on Spatio-temporal Feature Pyramid Module
Gong, Suming
Chen, Ying
[J]. 2020 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2020), 2020, : 338 - 341
[28] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
Fonnegra, Ruben D.
Diaz, Gloria M.
[J]. HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
[29] Video Copy Detection Using Histogram Based Spatio-temporal Features
Lee, Feifei
Zhao, Junjie
Kotani, Koji
Chen, Qiu
[J]. 2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
[30] 4-Dimensional Local Spatio-Temporal Features for Human Activity Recognition
Zhang, Hao
Parker, Lynne E.
[J]. 2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011,

← 1 2 3 4 5 →