A facial depression recognition method based on hybrid multi-head cross attention network

被引:4
|
作者
Li, Yutong [1 ]
Liu, Zhenyu [1 ]
Zhou, Li [1 ]
Yuan, Xiaoyan [1 ]
Shangguan, Zixuan [1 ]
Hu, Xiping [1 ]
Hu, Bin [1 ]
机构
[1] Lanzhou Univ, Gansu Prov Key Lab Wearable Comp, Lanzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
facial depression recognition; convolutional neural networks; attention mechanism; automatic depression estimation; end-to-end network; TEXTURE CLASSIFICATION; APPEARANCE;
D O I
10.3389/fnins.2023.1188434
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
IntroductionDeep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes. MethodsTo address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB). ResultsWe experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches. DiscussionWe proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Attention induced multi-head convolutional neural network for human activity recognition
    Khan, Zanobya N.
    Ahmad, Jamil
    APPLIED SOFT COMPUTING, 2021, 110
  • [22] Multi-Head Attention-Based Hybrid Deep Neural Network for Aeroengine Risk Assessment
    Li, Jian-Hang
    Gao, Xin-Yue
    Lu, Xiang
    Liu, Guo-Dong
    IEEE ACCESS, 2023, 11 : 113376 - 113389
  • [23] Speech recognition based on the transformer's multi-head attention in Arabic
    Mahmoudi O.
    Filali-Bouami M.
    Benchat M.
    International Journal of Speech Technology, 2024, 27 (01) : 211 - 223
  • [24] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
    Zhao, Hongkun
    Liu, Siyuan
    Chen, Yang
    Kong, Fanmin
    Zeng, Qingtian
    Li, Kang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [25] Hierarchical Multi-Task Learning Based on Interactive Multi-Head Attention Feature Fusion for Speech Depression Recognition
    Xing, Yujuan
    He, Ruifang
    Zhang, Chengwen
    Tan, Ping
    IEEE ACCESS, 2025, 13 : 51208 - 51219
  • [26] Cascade multi-head attention networks for action recognition
    Wang, Jiaze
    Peng, Xiaojiang
    Qiao, Yu
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 192
  • [27] Cross-media Hash Retrieval Using Multi-head Attention Network
    Li, Zhixin
    Ling, Feng
    Xu, Chuansheng
    Zhang, Canlong
    Ma, Huifang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1290 - 1297
  • [28] Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recommendation
    Zhao, Zhiwei
    Wang, Xiaoye
    Xiao, Yingyuan
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [29] Nested Deformable Multi-head Attention for Facial Image Inpainting
    Phutke, Shruti S.
    Murala, Subrahmanyam
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6067 - 6076
  • [30] Acoustic Word Embedding Based on Multi-Head Attention Quadruplet Network
    Zhu, Shirong
    Zhang, Ying
    He, Kai
    Zhao, Lasheng
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 184 - 188