A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition

被引:0
|
作者
Zhang, Xiaoheng [1 ]
Cui, Weigang [2 ]
Hu, Bin [3 ]
Li, Yang [1 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Engn Med, Beijing 100191, Peoples R China
[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou 730000, Gansu, Peoples R China
基金
中国博士后科学基金; 国家杰出青年科学基金; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Emotion recognition; cross-modal alignment; multimodal fusion; semantic refinement; FUSION;
D O I
10.1109/TAFFC.2024.3354382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition in conversation (ERC) based on multiple modalities has attracted enormous attention. However, most research simply concatenated multimodal representations, generally neglecting the impact of cross-modal correspondences and uncertain factors, and leading to the cross-modal misalignment problems. Furthermore, recent methods only considered simple contextual features, commonly ignoring semantic clues and resulting in an insufficient capture of the semantic consistency. To address these limitations, we propose a novel multi-level alignment and cross-modal unified semantic graph refinement network (MA-CMU-SGRNet) for ERC task. Specifically, a multi-level alignment (MA) is first designed to bridge the gap between acoustic and lexical modalities, which can effectively contrast both the instance-level and prototype-level relationships, separating the multimodal features in the latent space. Second, a cross-modal uncertainty-aware unification (CMU) is adopted to generate a unified representation in joint space considering the ambiguity of emotion. Finally, a dual-encoding semantic graph refinement network (SGRNet) is investigated, which includes a syntactic encoder to aggregate information from near neighbors and a semantic encoder to focus on useful semantically close neighbors. Extensive experiments on three multimodal public datasets show the effectiveness of our proposed method compared with the state-of-the-art methods, indicating its potential application in conversational emotion recognition.
引用
收藏
页码:1553 / 1566
页数:14
相关论文
共 50 条
  • [1] Semantic enhancement and multi-level alignment network for cross-modal retrieval
    Chen, Jia
    Zhang, Hong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024,
  • [2] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
    Nian, Fudong
    Ding, Ling
    Hu, Yuxia
    Gu, Yanhong
    [J]. MATHEMATICS, 2022, 10 (18)
  • [3] Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Li, Ruichen
    Zhao, Jinming
    Jin, Qin
    [J]. INTERSPEECH 2021, 2021, : 4488 - 4492
  • [4] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
    Dong, Jianfeng
    Long, Zhongzi
    Mao, Xiaofeng
    Lin, Changting
    He, Yuan
    Ji, Shouling
    [J]. NEUROCOMPUTING, 2021, 440 : 207 - 219
  • [5] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
    Gong, Peizhu
    Liu, Jin
    Wu, Zhongdai
    Han, Bing
    Wang, Y. Ken
    He, Huihua
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220
  • [6] Multi-Level Cross-Modal Alignment for Image Clustering
    Qiu, Liping
    Zhang, Qin
    Chen, Xiaojun
    Cai, Shaotian
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14695 - 14703
  • [7] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [8] Cross-Modal Semantic Alignment and Information Refinement for Multi-Modal Sentiment Analysis
    Ding, Meirong
    Chen, Hongye
    Zeng, Biqing
    [J]. Computer Engineering and Applications, 2024, 60 (22) : 114 - 125
  • [9] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    [J]. IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [10] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    [J]. VISUAL COMPUTER, 2023, 39 (09): : 3979 - 3994