A spatio-temporal integrated model based on local and global features for video expression recognition

被引:0
|
作者
Min Hu
Peng Ge
Xiaohua Wang
Hui Lin
Fuji Ren
机构
[1] Hefei University of Technology,Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education
[2] Hefei University of Technology,School of Computer and Information, Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine
[3] Hefei University of Technology,School of Electronic Science and Application Physics
[4] University of Tokushima,Graduate School of Advanced Technology and Science
来源
The Visual Computer | 2022年 / 38卷
关键词
Video expression recognition; Local and global features; Attention mechanism; Feature recalibration; Network integration;
D O I
暂无
中图分类号
学科分类号
摘要
Facial expressions can be represented largely by the dynamic variations of important facial expression parts, i.e., eyebrows, eyes, nose, and mouth. The features of these parts are regarded as local features. However, facial global information is also useful for recognition because it is a necessary complement to local features. In this paper, a spatio-temporal integrated model that jointly learns local and global features is proposed for video expression recognition. Firstly, to capture the action of facial key units, a spatio-temporal attention part-gradient-based hierarchical bidirectional recurrent neural network (spatio-temporal attention PGHRNN) is constructed. It can capture the dynamic variations of gradients around facial landmark points. In addition, a new kind of spatial attention mechanism is introduced to recalibrate the features of facial various parts adaptively. Secondly, to complement the local features extracted by the spatio-temporal attention PGHRNN, a squeeze-and-excitation residual network of 50 layers with long short-term memory network (SE-ResNet-50-LSTM) is used as a global feature extractor and classifier. Finally, to integrate the local and global features and improve the performance of facial expression recognition, a joint adaptive fine-tuning method (JAFTM) is proposed to combine the two networks, which can adaptively adjust the network weights. Extensive experiments demonstrate that our proposed model can achieve a superior recognition accuracy of 98.95% on CK + for 7-class facial expressions and 85.40% on MMI database, which outperforms other state-of-the-art methods.
引用
收藏
页码:2617 / 2634
页数:17
相关论文
共 50 条
  • [1] A spatio-temporal integrated model based on local and global features for video expression recognition
    Hu, Min
    Ge, Peng
    Wang, Xiaohua
    Lin, Hui
    Ren, Fuji
    [J]. VISUAL COMPUTER, 2022, 38 (08): : 2617 - 2634
  • [2] Facial Expression Recognition Based on the Fusion of Spatio-temporal Features in Video Sequences
    Wang Xiaohua
    Xia Chen
    Hu Min
    Ren Fuji
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (03) : 626 - 632
  • [3] Video Behavior Recognition of Dairy Cows Based on Spatio-temporal Features
    Wang, Kejian
    Sun, Yifei
    Si, Yongsheng
    Han, Xianzhong
    He, Zhenxue
    [J]. Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (05): : 261 - 267
  • [4] Facial Expression Recognition Based on Combination of Spatio-temporal and Spectral Features in Local Facial Regions
    Abounasr, Nakisa
    Pourghassem, Hossein
    [J]. 2013 8TH IRANIAN CONFERENCE ON MACHINE VISION & IMAGE PROCESSING (MVIP 2013), 2013, : 446 - 450
  • [5] GaitSlice: A gait recognition model based on spatio-temporal slice features
    Li, Huakang
    Qiu, Yidan
    Zhao, Huimin
    Zhan, Jin
    Chen, Rongjun
    Wei, Tuanjie
    Huang, Zhihui
    [J]. Pattern Recognition, 2022, 124
  • [6] GaitSlice: A gait recognition model based on spatio-temporal slice features
    Li, Huakang
    Qiu, Yidan
    Zhao, Huimin
    Zhan, Jin
    Chen, Rongjun
    Wei, Tuanjie
    Huang, Zhihui
    [J]. PATTERN RECOGNITION, 2022, 124
  • [7] ACTION RECOGNITION BY ORTHOGONALIZED SUBSPACES OF LOCAL SPATIO-TEMPORAL FEATURES
    Raytchev, Bisser
    Shigenaka, Ryosuke
    Tamaki, Toru
    Kaneda, Kazufumi
    [J]. 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 4387 - 4391
  • [8] Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information
    Xu, Jinchang
    Dong, Yuan
    Ma, Lilei
    Bai, Hongliang
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2833 - 2838
  • [9] Robust Skeleton-based Action Recognition through Hierarchical Aggregation of Local and Global Spatio-temporal Features
    Ren, J.
    Napoleon, R.
    Andre, B.
    Chris, S.
    Liu, M.
    Ma, J.
    [J]. 2018 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2018, : 901 - 906
  • [10] SKELETON ACTION RECOGNITION BASED ON SPATIO-TEMPORAL FEATURES
    Huang, Qian
    Xie, Mengting
    Li, Xing
    Wang, Shuaichen
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3284 - 3288