Temporal conditional Wasserstein GANs for audio-visual affect-related ties

被引:0
|
作者
Athanasiadis, Christos [1 ]
Hortal, Enrique [1 ]
Asteriadis, Stelios [1 ]
机构
[1] Maastricht Univ, Maastricht, Netherlands
关键词
Domain Adaptation; Audio Emotion Recognition; Generative Adversarial Networks; Attention Mechanisms;
D O I
10.1109/ACIIW52867.2021.9666277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition through audio is a rather challenging task that entails proper feature extraction and classification. Meanwhile, state-of-the-art classification strategies are usually based on deep learning architectures. Training complex deep learning networks normally requires very large audiovisual corpora with available emotion annotations. However, such availability is not always guaranteed since harvesting and annotating such datasets is a time-consuming task. In this work, temporal conditional Wasserstein Generative Adversarial Networks (tc-wGANs) are introduced to generate robust audio data by leveraging information from a face modality. Having as input temporal facial features extracted using a dynamic deep learning architecture (based on 3dCNN, LSTM and Transformer networks) and, additionally, conditional information related to annotations, our system manages to generate realistic spectrograms that represent audio clips corresponding to specific emotional context. As proof of their validity, apart from three quality metrics (Frechet Inception Distance, Inception Score and Structural Similarity index), we verified the generated samples applying an audio-based emotion recognition schema. When the generated samples are fused with the initial real ones, an improvement between 3.5 to 5.5% was achieved in audio emotion recognition performance for two state-of-the-art datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Audio-Visual Event Classification via Spatial-Temporal-Audio Words
    Cao, Yu
    Baang, Sung
    Liu, Shih-Hsi 'Alex'
    Li, Ming
    Hu, Sanqing
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 858 - +
  • [22] Scoring the Nation Audio-Visual Affect in Irish History Films
    Sehman, Steven
    Murchu, Niall o
    [J]. MUSIC SOUND AND THE MOVING IMAGE, 2024, 18 (01) : 29 - 52
  • [23] Audio-visual affect recognition in activation-evaluation space
    Zeng, ZH
    Zhang, ZQ
    Pianfetti, B
    Tu, JL
    Huang, TS
    [J]. 2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 828 - 831
  • [24] The temporal dynamics of conscious and unconscious audio-visual semantic integration
    Gao, Mingjie
    Zhu, Weina
    Drewes, Jan
    [J]. HELIYON, 2024, 10 (13)
  • [25] Recalibration of temporal order perception by exposure to audio-visual asynchrony
    Vroomen, J
    Keetels, M
    de Gelder, B
    Bertelson, P
    [J]. COGNITIVE BRAIN RESEARCH, 2004, 22 (01): : 32 - 35
  • [26] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [27] Audio-Visual Temporal Saliency Modeling Validated by fMRI Data
    Koutras, Petros
    Panagiotaropoulou, Georgia
    Tsiami, Antigoni
    Maragos, Petros
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2081 - 2091
  • [28] Selective Attention Modulates the Direction of Audio-Visual Temporal Recalibration
    Ikumi, Nara
    Soto-Faraco, Salvador
    [J]. PLOS ONE, 2014, 9 (07):
  • [29] ISLA: Temporal Segmentation and Labeling for Audio-Visual Emotion Recognition
    Kim, Yelin
    Provost, Emily Mower
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 196 - 208
  • [30] Window of audio-visual simultaneity is unaffected by spatio-temporal visual clutter
    Erik Van der Burg
    John Cass
    David Alais
    [J]. Scientific Reports, 4