A CROSS-ATTENTION EMOTION RECOGNITION ALGORITHM BASED ON AUDIO AND VIDEO MODALITIES

被引:0
|
作者
Wu, Xiao [1 ]
Mu, Xuan [1 ]
Qi, Wen [3 ]
Liu, Xiaorui [1 ,2 ]
机构
[1] Qingdao Univ, Automat Sch, 308 Ningxia Rd, Qingdao 266000, Peoples R China
[2] Shandong Key Lab Ind Control, Qingdao 266071, Peoples R China
[3] South China Univ Technol, Sch Future Technol, Guangzhou 511436, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年
关键词
multimodal; emotion recognition; parallel convolution; cross attention;
D O I
10.1109/ICASSPW62465.2024.10626511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In recent years, emotion recognition has received significant attention. In this paper, multimodal information, including speech and facial expressions, is adopted to realize human emotion classification. Firstly, we propose a speech recognition model based on the Parallel convolutional module (Pconv), and an expression emotion recognition model based on the improved Inception-ResnetV2 network. The recognize futures of speech and expression will be further fused by using a cross-attention module coordinated with Bidirectional Long Short-Term Memory (BiLSTM). The experimental results organized on CH-SIMS and CMU-MOSI datasets have demonstrated that the proposed algorithm achieves high recognition accuracy. Each component of this model could contribute to performance improvement in the fair way.
引用
收藏
页码:309 / 313
页数:5
相关论文
共 50 条
  • [1] Mutual Cross-Attention in Dyadic Fusion Networks for Audio-Video Emotion Recognition
    Luo, Jiachen
    Phan, Huy
    Wang, Lin
    Reiss, Joshua
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [2] A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    de Melo, Wheidima Carneiro
    Ullah, Nasib
    Aslam, Haseeb
    Zeeshan, Osama
    Denorme, Theo
    Pedersoli, Marco
    Koerich, Alessandro L.
    Bacon, Simon
    Cardinal, Patrick
    Granger, Eric
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2485 - 2494
  • [3] Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
    Praveen, R. Gnana
    Cardinal, Patrick
    Granger, Eric
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2023, 5 (03): : 360 - 373
  • [4] AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition
    Das, Avishek
    Sarma, Moumita Sen
    Hoque, Mohammed Moshiul
    Siddique, Nazmul
    Dewan, M. Ali Akber
    SENSORS, 2024, 24 (18)
  • [5] CAST: Cross-Attention in Space and Time for Video Action Recognition
    Lee, Dongho
    Lee, Jongseo
    Choi, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Multimodal Cross-Attention Bayesian Network for Social News Emotion Recognition
    Wang, Xinzhi
    Li, Mengyue
    Chang, Yudong
    Luo, Xiangfeng
    Yao, Yige
    Li, Zhichao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
  • [8] Hierarchical LSTM-Based Audio and Video Emotion Recognition With Embedded Attention Mechanism
    Liu Tianbao
    Zhang Lingtao
    Yu Wentao
    Wei Dongchuan
    Fan Yijun
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (02)
  • [9] Cross-Attention Transformer for Video Interpolation
    Kim, Hannah Halin
    Yu, Shuzhi
    Yuan, Shuai
    Tomasi, Carlo
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 325 - 342
  • [10] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
    Khan, Mustaqeem
    Gueaieb, Wail
    El Saddik, Abdulmotaleb
    Kwon, Soonil
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245