Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention

被引:1
|
作者
Tian, Yuan [1 ]
Zhu, Jingxuan [1 ]
Yao, Huang [1 ]
Chen, Di [1 ]
机构
[1] Cent China Normal Univ, Fac Artificial Intelligence Educ, Wuhan 430079, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
facial expression recognition; attention; vision transformer;
D O I
10.3390/app14156471
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Facial expression recognition has wide application prospects in many occasions. Due to the complexity and variability of facial expressions, facial expression recognition has become a very challenging research topic. This paper proposes a Vision Transformer expression recognition method based on hybrid local attention (HLA-ViT). The network adopts a dual-stream structure. One stream extracts the hybrid local features and the other stream extracts the global contextual features. These two streams constitute a global-local fusion attention. The hybrid local attention module is proposed to enhance the network's robustness to face occlusion and head pose variations. The convolutional neural network is combined with the hybrid local attention module to obtain feature maps with local prominent information. Robust features are then captured by the ViT from the global perspective of the visual sequence context. Finally, the decision-level fusion mechanism fuses the expression features with local prominent information, adding complementary information to enhance the network's recognition performance and robustness against interference factors such as occlusion and head posture changes in natural scenes. Extensive experiments demonstrate that our HLA-ViT network achieves an excellent performance with 90.45% on RAF-DB, 90.13% on FERPlus, and 65.07% on AffectNet.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Lightweight Facial Expression Recognition Based on Hybrid Multiscale and Multi-Head Collaborative Attention
    Zhang, Haitao
    Zhuang, Xufei
    Gao, Xudong
    Mao, Rui
    Ren, Qing-Dao-Er-Ji
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 304 - 316
  • [32] PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild
    Huang, Yin-Fu
    Tsai, Chia-Hsin
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 3281 - 3293
  • [33] Facial Expression Recognition using Local Directional Pattern variants and Deep Learning Computer Vision and Facial Recognition
    Chengeta, Kennedy
    Viriri, Serestina
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [34] Self-supervised vision transformer-based few-shot learning for facial expression recognition
    Chen, Xuanchi
    Zheng, Xiangwei
    Sun, Kai
    Liu, Weilong
    Zhang, Yuang
    INFORMATION SCIENCES, 2023, 634 : 206 - 226
  • [35] LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition
    Fu, Pengbin
    Liu, Daxing
    Yang, Huirong
    INFORMATION, 2022, 13 (05)
  • [36] Enhanced Deep Learning Hybrid Model of CNN Based on Spatial Transformer Network for Facial Expression Recognition
    Khan, Nizamuddin
    Singh, Ajay Vikram
    Agrawal, Rajeev
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (14)
  • [37] Expression snippet transformer for robust video-based facial expression recognition
    Liu, Yuanyuan
    Wang, Wenbin
    Feng, Chuanxu
    Zhang, Haoyu
    Chen, Zhe
    Zhan, Yibing
    PATTERN RECOGNITION, 2023, 138
  • [38] Facial Expression Recognition with Attention Mechanism
    Wang, Caixia
    Wang, Zhihui
    Cui, Dong
    2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [39] Facial Expression Recognition Based on Local Transitional Pattern
    Jabid, Taskeed
    Chae, Oksam
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (05): : 2007 - 2018
  • [40] Facial expression recognition based on local binary patterns
    Feng X.
    Pietikäinen M.
    Hadid A.
    Pattern Recogn. Image Anal., 2007, 4 (592-598): : 592 - 598