A facial depression recognition method based on hybrid multi-head cross attention network

被引:4
|
作者
Li, Yutong [1 ]
Liu, Zhenyu [1 ]
Zhou, Li [1 ]
Yuan, Xiaoyan [1 ]
Shangguan, Zixuan [1 ]
Hu, Xiping [1 ]
Hu, Bin [1 ]
机构
[1] Lanzhou Univ, Gansu Prov Key Lab Wearable Comp, Lanzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
facial depression recognition; convolutional neural networks; attention mechanism; automatic depression estimation; end-to-end network; TEXTURE CLASSIFICATION; APPEARANCE;
D O I
10.3389/fnins.2023.1188434
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
IntroductionDeep-learn methods based on convolutional neural networks (CNNs) have demonstrated impressive performance in depression analysis. Nevertheless, some critical challenges need to be resolved in these methods: (1) It is still difficult for CNNs to learn long-range inductive biases in the low-level feature extraction of different facial regions because of the spatial locality. (2) It is difficult for a model with only a single attention head to concentrate on various parts of the face simultaneously, leading to less sensitivity to other important facial regions associated with depression. In the case of facial depression recognition, many of the clues come from a few areas of the face simultaneously, e.g., the mouth and eyes. MethodsTo address these issues, we present an end-to-end integrated framework called Hybrid Multi-head Cross Attention Network (HMHN), which includes two stages. The first stage consists of the Grid-Wise Attention block (GWA) and Deep Feature Fusion block (DFF) for the low-level visual depression feature learning. In the second stage, we obtain the global representation by encoding high-order interactions among local features with Multi-head Cross Attention block (MAB) and Attention Fusion block (AFB). ResultsWe experimented on AVEC2013 and AVEC2014 depression datasets. The results of AVEC 2013 (RMSE = 7.38, MAE = 6.05) and AVEC 2014 (RMSE = 7.60, MAE = 6.01) demonstrated the efficacy of our method and outperformed most of the state-of-the-art video-based depression recognition approaches. DiscussionWe proposed a deep learning hybrid model for depression recognition by capturing the higher-order interactions between the depression features of multiple facial regions, which can effectively reduce the error in depression recognition and gives great potential for clinical experiments.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Metrological parameter planning method based on a multi-head sparse graph attention network for airborne products
    Kong, Shengjie
    Huang, Xiang
    Li, Shuanggao
    Li, Gen
    Zhang, Dong
    MEASUREMENT, 2025, 242
  • [42] Interactive Selection Recommendation Based on the Multi-head Attention Graph Neural Network
    Zhang, Shuxi
    Chen, Jianxia
    Yao, Meihan
    Wu, Xinyun
    Ge, Yvfan
    Li, Shu
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT III, 2024, 14449 : 447 - 458
  • [43] Deep Multi-Head Attention Network for Aspect-Based Sentiment Analysis
    Yan, Danfeng
    Chen, Jiyuan
    Cui, Jianfei
    Shan, Ao
    Shi, Wenting
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 695 - 700
  • [44] On the diversity of multi-head attention
    Li, Jian
    Wang, Xing
    Tu, Zhaopeng
    Lyu, Michael R.
    NEUROCOMPUTING, 2021, 454 : 14 - 24
  • [45] A novel hybrid LSTM and masked multi-head attention based network for energy consumption prediction of industrial robots
    Wang, Zuoxue
    Jiang, Pei
    Li, Xiaobin
    He, Yan
    Wang, Xi Vincent
    Yang, Xue
    APPLIED ENERGY, 2025, 383
  • [46] Building pattern recognition by using an edge-attention multi-head graph convolutional network
    Wang, Haitao
    Xu, Yongyang
    Hu, Anna
    Xie, Xuejing
    Chen, Siqiong
    Xie, Zhong
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2025, 39 (04) : 732 - 757
  • [47] Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition
    Ahn, Chung-Soo
    Kasun, L. L. Chamara
    Sivadas, Sunil
    Rajapakse, Jagath C.
    INTERSPEECH 2022, 2022, : 744 - 748
  • [48] MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION
    Nediyanchath, Anish
    Paramasivam, Periyasamy
    Yenigalla, Promod
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7179 - 7183
  • [49] Multi-head attention-based two-stream EfficientNet for action recognition
    Zhou, Aihua
    Ma, Yujun
    Ji, Wanting
    Zong, Ming
    Yang, Pei
    Wu, Min
    Liu, Mingzhe
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 487 - 498
  • [50] Music Emotion Recognition Using Multi-head Self-attention-Based Models
    Xiao, Yao
    Ruan, Haoxin
    Zhao, Xujian
    Jin, Peiquan
    Cai, Xuebo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 101 - 114