Multimodal and Multi-view Models for Emotion Recognition

被引:0
|
作者
Aguilar, Gustavo [1 ]
Rozgic, Viktor [2 ]
Wang, Weiran [2 ]
Wang, Chao [2 ]
机构
[1] Univ Houston, Houston, TX 77004 USA
[2] Amazon Com, Seattle, WA 98108 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models. The majority of the studies focus on settings where both modalities are available in training and evaluation. However, in practice, this is not always the case; getting ASR output may represent a bottleneck in a deployment pipeline due to computational complexity or privacy-related constraints. To address this challenge, we study the problem of efficiently combining acoustic and lexical modalities during training while still providing a deployable acoustic model that does not require lexical inputs. We first experiment with multimodal models and two attention mechanisms to assess the extent of the benefits that lexical information can provide. Then, we frame the task as a multi-view learning problem to induce semantic information from a multimodal model into our acoustic-only network using a contrastive loss function. Our multimodal model outperforms the previous state of the art on the USC-IEMOCAP dataset reported on lexical and acoustic information. Additionally, our multi-view-trained acoustic network significantly surpasses models that have been exclusively trained with acoustic features.
引用
收藏
页码:991 / 1002
页数:12
相关论文
共 50 条
  • [1] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Lin Feng
    Lu-Yao Liu
    Sheng-Lan Liu
    Jian Zhou
    Han-Qing Yang
    Jie Yang
    [J]. Multimedia Tools and Applications, 2023, 82 : 28917 - 28935
  • [2] Multimodal speech emotion recognition based on multi-scale MFCCs and multi-view attention mechanism
    Feng, Lin
    Liu, Lu-Yao
    Liu, Sheng-Lan
    Zhou, Jian
    Yang, Han-Qing
    Yang, Jie
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28917 - 28935
  • [3] EMOTION RECOGNITION BASED ON MULTI-VIEW BODY GESTURES
    Shen, Zhijuan
    Cheng, Jun
    Hu, Xiping
    Dong, Qian
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3317 - 3321
  • [4] Multi-view laplacian least squares for human emotion recognition
    Guo, Shuai
    Feng, Lin
    Feng, Zhan-Bo
    Li, Yi-Hao
    Wang, Yang
    Liu, Sheng-Lan
    Qiao, Hong
    [J]. NEUROCOMPUTING, 2019, 370 : 78 - 87
  • [5] Emotion-aware Multi-view Contrastive Learning for Facial Emotion Recognition
    Kim, Daeha
    Song, Byung Cheol
    [J]. COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 178 - 195
  • [6] Multi-view Common Space Learning for Emotion Recognition in the Wild
    Wu, Jianlong
    Lin, Zhouchen
    Zha, Hongbin
    [J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 464 - 471
  • [7] Extrapolating single view face models for multi-view recognition
    Sanderson, C
    Bengio, S
    [J]. PROCEEDINGS OF THE 2004 INTELLIGENT SENSORS, SENSOR NETWORKS & INFORMATION PROCESSING CONFERENCE, 2004, : 581 - 586
  • [8] A Bootstrapped Multi-View Weighted Kernel Fusion Framework for Cross-Corpus Integration of Multimodal Emotion Recognition
    Chang, Chun-Min
    Su, Bo-Hao
    Lin, Shih-Chen
    Li, Jeng-Lin
    Lee, Chi-Chun
    [J]. 2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 377 - 382
  • [9] Multi-View Speech Emotion Recognition Via Collective Relation Construction
    Hou, Mixiao
    Zhang, Zheng
    Cao, Qi
    Zhang, David
    Lu, Guangming
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
  • [10] HIERARCHICAL AND MULTI-VIEW DEPENDENCY MODELLING NETWORK FOR CONVERSATIONAL EMOTION RECOGNITION
    Ruan, Yu-Ping
    Zheng, Shu-Kai
    Li, Taihao
    Wang, Fen
    Pei, Guanxiong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7032 - 7036