Temporal conditional Wasserstein GANs for audio-visual affect-related ties

被引:0
|
作者
Athanasiadis, Christos [1 ]
Hortal, Enrique [1 ]
Asteriadis, Stelios [1 ]
机构
[1] Maastricht Univ, Maastricht, Netherlands
关键词
Domain Adaptation; Audio Emotion Recognition; Generative Adversarial Networks; Attention Mechanisms;
D O I
10.1109/ACIIW52867.2021.9666277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition through audio is a rather challenging task that entails proper feature extraction and classification. Meanwhile, state-of-the-art classification strategies are usually based on deep learning architectures. Training complex deep learning networks normally requires very large audiovisual corpora with available emotion annotations. However, such availability is not always guaranteed since harvesting and annotating such datasets is a time-consuming task. In this work, temporal conditional Wasserstein Generative Adversarial Networks (tc-wGANs) are introduced to generate robust audio data by leveraging information from a face modality. Having as input temporal facial features extracted using a dynamic deep learning architecture (based on 3dCNN, LSTM and Transformer networks) and, additionally, conditional information related to annotations, our system manages to generate realistic spectrograms that represent audio clips corresponding to specific emotional context. As proof of their validity, apart from three quality metrics (Frechet Inception Distance, Inception Score and Structural Similarity index), we verified the generated samples applying an audio-based emotion recognition schema. When the generated samples are fused with the initial real ones, an improvement between 3.5 to 5.5% was achieved in audio emotion recognition performance for two state-of-the-art datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] The Effect of Ageing on Audio-Visual Temporal Order Judgements and Visual Gap Detection
    Harvey, Emilie C.
    Bennett, Patrick J.
    Sekuler, Allison B.
    [J]. CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE EXPERIMENTALE, 2012, 66 (04): : 298 - 298
  • [32] Window of audio-visual simultaneity is unaffected by spatio-temporal visual clutter
    Van der Burg, Erik
    Cass, John
    Alais, David
    [J]. SCIENTIFIC REPORTS, 2014, 4
  • [33] Audio-Visual Gated-Sequenced Neural Networks for Affect Recognition
    Aspandi, Decky
    Sukno, Federico
    Schuller, Bjorn W.
    Binefa, Xavier
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2193 - 2208
  • [34] Multi-stream confidence analysis for audio-visual affect recognition
    Zeng, ZH
    Tu, JL
    Liu, M
    Huang, TS
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 964 - 971
  • [35] AVAB-DBS: Audio-Visual Affect Bursts Database for Synthesis
    El Haddad, Kevin
    Cakmak, Huseyin
    Dupont, Stephane
    Dutoit, Thierry
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2175 - 2179
  • [36] Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
    Sadeghi, Mostafa
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1788 - 1800
  • [37] Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN
    Ma, Fei
    Li, Yang
    Ni, Shiguang
    Huang, Shao-Lun
    Zhang, Lin
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [38] Experience with crossmodal statistics reduces the sensitivity for audio-visual temporal asynchrony
    Habets, Boukje
    Bruns, Patrick
    Roeder, Brigitte
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [39] The development of audio-visual temporal precision precedes its rapid recalibration
    Han, Shui'er
    Chen, Yi-Chuan
    Maurer, Daphne
    Shore, David I.
    Lewis, Terri L.
    Stanley, Brendan M.
    Alais, David
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [40] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki, Yoshiki
    Hayashi, Masaki
    Kaneko, Naoshi
    Aoki, Yoshimitsu
    [J]. Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268