DISENTANGLEMENT FOR AUDIO-VISUAL EMOTION RECOGNITION USING MULTITASK SETUP

被引:6
|
作者
Peri, Raghuveer [1 ,3 ]
Parthasarathy, Srinivas [2 ]
Bradshaw, Charles [2 ]
Sundaram, Shiva [2 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] Amazon Inc, Sunnyvale, CA USA
[3] Amazon, Sunnyvale, CA USA
关键词
Emotion recognition; multimodal learning; disentanglement; multitask learning; EXPRESSION; SPEAKER;
D O I
10.1109/ICASSP39728.2021.9414705
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world data used for training. This work explores the disentanglement of multimodal signal representations for the primary task of emotion recognition and a secondary person identification task. In particular, we developed a multitask framework to extract low-dimensional embeddings that aim to capture emotion specific information, while containing minimal information related to person identity. We evaluate three different techniques for disentanglement and report results of up to 13% disentanglement while maintaining emotion recognition performance.
引用
收藏
页码:6344 / 6348
页数:5
相关论文
共 50 条
  • [1] AUDIO-VISUAL EMOTION RECOGNITION USING BOLTZMANN ZIPPERS
    Lu, Kun
    Jia, Yunde
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 2589 - 2592
  • [2] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    [J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [3] Multimodal Emotion Recognition using Physiological and Audio-Visual Features
    Matsuda, Yuki
    Fedotov, Dmitrii
    Takahashi, Yuta
    Arakawa, Yutaka
    Yasumo, Keiichi
    Minker, Wolfgang
    [J]. PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 946 - 951
  • [4] Audio-Visual Attention Networks for Emotion Recognition
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    [J]. AVSU'18: PROCEEDINGS OF THE 2018 WORKSHOP ON AUDIO-VISUAL SCENE UNDERSTANDING FOR IMMERSIVE MULTIMEDIA, 2018, : 27 - 32
  • [5] Deep operational audio-visual emotion recognition
    Akturk, Kaan
    Keceli, Ali Seydi
    [J]. NEUROCOMPUTING, 2024, 588
  • [6] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [7] AUDIO-VISUAL EMOTION RECOGNITION WITH BOOSTED COUPLED HMM
    Lu, Kun
    Jia, Yunde
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1148 - 1151
  • [8] Audio-visual based emotion recognition - A new approach
    Song, ML
    Bu, JJ
    Chen, C
    Li, N
    [J]. PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, : 1020 - 1025
  • [9] Temporal aggregation of audio-visual modalities for emotion recognition
    Birhala, Andreea
    Ristea, Catalin Nicolae
    Radoi, Anamaria
    Dutu, Liviu Cristian
    [J]. 2020 43RD INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2020, : 305 - 308
  • [10] Deep emotion recognition based on audio-visual correlation
    Hajarolasvadi, Noushin
    Demirel, Hasan
    [J]. IET COMPUTER VISION, 2020, 14 (07) : 517 - 527