A joint hierarchical cross-attention graph convolutional network for multi-modal facial expression recognition

被引:1
|
作者
Xu, Chujie [1 ]
Du, Yong [1 ]
Wang, Jingzi [2 ]
Zheng, Wenjie [1 ]
Li, Tiejun [1 ]
Yuan, Zhansheng [1 ]
机构
[1] Jimei Univ, Sch Ocean Informat Engn, Xiamen, Peoples R China
[2] Natl Chengchi Univ, Dept Comp Sci, Chengchi, Taiwan
关键词
cross-attention mechanism; emotional recognition in conversations; graph convolution network; IoT; multi-modal fusion; transformer; EMOTION RECOGNITION; VALENCE;
D O I
10.1111/coin.12607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning-based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross-attention model is introduced to alleviate this issue. However, multi-scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio-visual (A-V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra-modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross-attention mechanism to exploit the complementary inter-modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU-MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross-attention model and other methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
    Sun, Mingyi
    Cui, Weigang
    Zhang, Yue
    Yu, Shuyue
    Liao, Xiaofeng
    Hu, Bin
    Li, Yang
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (12) : 11823 - 11832
  • [22] Facial expression recognition using densely connected convolutional neural network and hierarchical spatial attention
    Gan, Chenquan
    Xiao, Junhao
    Wang, Zhangyi
    Zhang, Zufan
    Zhu, Qingyi
    [J]. IMAGE AND VISION COMPUTING, 2022, 117
  • [23] Graph Convolutional Incomplete Multi-modal Hashing
    Shen, Xiaobo
    Chen, Yinfan
    Pan, Shirui
    Liu, Weiwei
    Zheng, Yuhui
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7029 - 7037
  • [24] Hierarchical graph attention networks for multi-modal rumor detection on social media
    Xu, Fan
    Zeng, Lei
    Huang, Qi
    Yan, Keyu
    Wang, Mingwen
    Sheng, Victor S.
    [J]. NEUROCOMPUTING, 2024, 569
  • [25] Facial Expression Recognition Using Multi-Branch Attention Convolutional Neural Network
    He, Yinggang
    [J]. IEEE ACCESS, 2023, 11 : 1244 - 1253
  • [26] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    [J]. IEEE Access, 2024, 12 : 137203 - 137221
  • [27] Cross-view adaptive graph attention network for dynamic facial expression recognition
    Li, Yan
    Xi, Min
    Jiang, Dongmei
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (5) : 2715 - 2728
  • [28] Cross-view adaptive graph attention network for dynamic facial expression recognition
    Yan Li
    Min Xi
    Dongmei Jiang
    [J]. Multimedia Systems, 2023, 29 : 2715 - 2728
  • [29] TC-GCN: Triple cross-attention and graph convolutional network for traffic forecasting
    Wang, Lei
    Guo, Deke
    Wu, Huaming
    Li, Keqiu
    Yu, Wei
    [J]. INFORMATION FUSION, 2024, 105
  • [30] Hierarchical scale convolutional neural network for facial expression recognition
    Xinqi Fan
    Mingjie Jiang
    Ali Raza Shahid
    Hong Yan
    [J]. Cognitive Neurodynamics, 2022, 16 : 847 - 858