Spatial-Temporal Attention Network for Depression Recognition from facial videos

被引:10
|
作者
Pan, Yuchen [1 ]
Shang, Yuanyuan [1 ,3 ]
Liu, Tie [1 ,4 ]
Shao, Zhuhong [1 ,4 ]
Guo, Guodong [2 ]
Ding, Hui [1 ,4 ]
Hu, Qiang [5 ]
机构
[1] Capital Normal Univ, Coll Informat Engn, Beijing 100048, Peoples R China
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
[3] Beijing Adv Innovat Ctr Imaging Technol, Beijing 100048, Peoples R China
[4] Beijing Key Lab Elect Syst Reliabil Technol, Beijing 100048, Peoples R China
[5] ZhenJiang Mental Hlth Ctr, Dept Psychiat, Zhenjiang 212000, Jiangsu, Peoples R China
关键词
Depression recognition; Attention mechanism; Video recognition; Deep learning; Visualization; Convolutional neural network; DEEP NETWORKS; APPEARANCE;
D O I
10.1016/j.eswa.2023.121410
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies focus on the utilization of deep learning approaches to recognize depression from facial videos. However, these approaches have been hindered by their limited performance, which can be attributed to the inadequate consideration of global spatial-temporal relationships in significant local regions within faces. In this paper, we propose Spatial-Temporal Attention Depression Recognition Network (STA-DRN) for depression recognition to enhance feature extraction and increase the relevance of depression recognition by capturing the global and local spatial-temporal information. Our proposed approach includes a novel Spatial-Temporal Attention (STA) mechanism, which generates spatial and temporal attention vectors to capture the global and local spatial-temporal relationships of features. To the best of our knowledge, this is the first attempt to incorporate pixel-wise STA mechanisms for depression recognition based on 3D video analysis. Additionally, we propose an attention vector-wise fusion strategy in the STA module, which combines information from both spatial and temporal domains. We then design the STA-DRN by stacking STA modules ResNet-style. The experimental results on AVEC 2013 and AVEC 2014 show that our method achieves competitive performance, with mean absolute error/root mean square error (MAE/RMSE) scores of 6.15/7.98 and 6.00/7.75, respectively. Moreover, visualization analysis demonstrates that the STA-DRN responds significantly in specific locations related to depression. The code is available at: https://github.com/divertingPan/STA-DRN.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [2] Robust Heart Rate Estimation With Spatial-Temporal Attention Network From Facial Videos
    Hu, Min
    Qian, Fei
    Wang, Xiaohua
    He, Lei
    Guo, Dong
    Ren, Fuji
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 639 - 647
  • [3] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    [J]. IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [4] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    [J]. Computer Engineering and Applications, 2023, 59 (09) : 150 - 158
  • [5] Facial Expression Recognition Based on Spatial-Temporal Fusion with Attention Mechanism
    Zhang, Lifeng
    Zheng, Xiangwei
    Chen, Xuanchi
    Ren, Xiuxiu
    Ji, Cun
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6109 - 6124
  • [6] Facial Expression Recognition Based on Spatial-Temporal Fusion with Attention Mechanism
    Lifeng Zhang
    Xiangwei Zheng
    Xuanchi Chen
    Xiuxiu Ren
    Cun Ji
    [J]. Neural Processing Letters, 2023, 55 : 6109 - 6124
  • [7] A Mix Fusion Spatial-Temporal Network for Facial Expression Recognition
    Shu, Chang
    Xue, Feng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 315 - 326
  • [8] Spatial-temporal pooling for action recognition in videos
    Wang, Jiaming
    Shao, Zhenfeng
    Huang, Xiao
    Lu, Tao
    Zhang, Ruiqian
    Lv, Xianwei
    [J]. NEUROCOMPUTING, 2021, 451 : 265 - 278
  • [9] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [10] STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
    Chen, Weicong
    Zhang, Dong
    Li, Ming
    Lee, Dah-Jye
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 800 - 810