A joint hierarchical cross-attention graph convolutional network for multi-modal facial expression recognition

被引:1
|
作者
Xu, Chujie [1 ]
Du, Yong [1 ]
Wang, Jingzi [2 ]
Zheng, Wenjie [1 ]
Li, Tiejun [1 ]
Yuan, Zhansheng [1 ]
机构
[1] Jimei Univ, Sch Ocean Informat Engn, Xiamen, Peoples R China
[2] Natl Chengchi Univ, Dept Comp Sci, Chengchi, Taiwan
关键词
cross-attention mechanism; emotional recognition in conversations; graph convolution network; IoT; multi-modal fusion; transformer; EMOTION RECOGNITION; VALENCE;
D O I
10.1111/coin.12607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotional recognition in conversations (ERC) is increasingly being applied in various IoT devices. Deep learning-based multimodal ERC has achieved great success by leveraging diverse and complementary modalities. Although most existing methods try to adopt attention mechanisms to fuse different information, these methods ignore the complementarity between modalities. To this end, the joint cross-attention model is introduced to alleviate this issue. However, multi-scale feature information on different modalities is not utilized. Moreover, the context relationship plays an important role in feature extraction in the expression recognition task. In this paper, we propose a novel joint hierarchical graph convolution network (JHGCN) which exploits different layer features and context relationships for facial expression recognition based on audio-visual (A-V) information. Specifically, we adopt different deep networks to extract features from different modalities individually. For V modality, we construct V graph data based on patch embeddings which are extracted from the transformer encoder. Moreover, we embed the graph convolution which can leverage the intra-modality relationships with the transformer encoder. Then, the deep feature from different layers is fed to the hierarchical fusion module to enhance feature representation. At last, we use the joint cross-attention mechanism to exploit the complementary inter-modality relationships. To validate the proposed model, we have conducted various experiments on the AffWild2 and CMU-MOSI datasets. All results confirm that our proposed model achieves highly promising performance compared to the joint cross-attention model and other methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism
    Li, Hongchan
    Lu, Yantong
    Zhu, Haodong
    [J]. ELECTRONICS, 2024, 13 (11)
  • [42] Joint spatial and scale attention network for multi-view facial expression recognition
    Liu, Yuanyuan
    Peng, Jiyao
    Dai, Wei
    Zeng, Jiabei
    Shan, Shiguang
    [J]. PATTERN RECOGNITION, 2023, 139
  • [43] Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
    Zhang, Zongmeng
    Han, Xianjing
    Song, Xuemeng
    Yan, Yan
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8265 - 8277
  • [44] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    [J]. FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [45] Multi-View Hierarchical Attention Graph Convolutional Network with Domain Adaptation for EEG Emotion Recognition
    Li, Chao
    Wang, Feng
    Bian, Ning
    [J]. PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 624 - 630
  • [46] MAGDRA: A Multi-modal Attention Graph Network with Dynamic Routing-By-Agreement for multi-label emotion recognition
    Li, Xingye
    Liu, Jin
    Xie, Yurong
    Gong, Peizhu
    Zhang, Xiliang
    He, Huihua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [47] Cross-attention fusion based spatial-temporal multi-graph convolutional network for traffic flow prediction
    College of Information Science and Engineering, Xinjiang University, Urumqi
    830000, China
    [J]. Sensors, 1600, 24
  • [48] Multi-task Hierarchical Cross-Attention Network for Multi-label Text Classification
    Lu, Junyu
    Zhang, Hao
    Shen, Zhexu
    Shi, Kaiyuan
    Yang, Liang
    Xu, Bo
    Zhang, Shaowu
    Lin, Hongfei
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 156 - 167
  • [49] Multimodal Cross-Attention Graph Network for Desire Detection
    Gu, Ruitong
    Wang, Xin
    Yang, Qinghong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 512 - 523
  • [50] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
    Cheng, Bo
    Zhu, Jia
    Guo, Meimei
    [J]. NEUROCOMPUTING, 2022, 500 : 581 - 591