GAN-based Data Generation for Speech Emotion Recognition

被引:14
|
作者
Eskimez, Sefik Emre [1 ]
Dimitriadis, Dimitrios [1 ]
Gmyr, Robert [1 ]
Kumanati, Kenichi [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
来源
关键词
speech emotion recognition; generative adversarial networks; data augmentation;
D O I
10.21437/Interspeech.2020-2898
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this work, we propose a GAN-based method to generate synthetic data for speech emotion recognition. Specifically, we investigate the usage of GANs for capturing the data manifold when the data is eyes-off, i.e., where we can train networks using the data but cannot copy it from the clients. We propose a CNN-based GAN with spectral normalization on both the generator and discriminator, both of which are pre-trained on large unlabeled speech corpora. We show that our method provides better speech emotion recognition performance than a strong baseline. Furthermore, we show that even after the data on the client is lost, our model can generate similar data that can be used for model bootstrapping in the future. Although we evaluated our method for speech emotion recognition, it can be applied to other tasks.
引用
收藏
页码:3446 / 3450
页数:5
相关论文
共 50 条
  • [1] A GAN-Based Data Augmentation Method for Multimodal Emotion Recognition
    Luo, Yun
    Zhu, Li-Zhen
    Lu, Bao-Liang
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT I, 2019, 11554 : 141 - 150
  • [2] GAN-Based Data Augmentation for Visual Finger Spelling Recognition
    Kwolek, Bogdan
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041
  • [3] Enhancing human action recognition with GAN-based data augmentation
    Pulakurthi, Prasanna Reddy
    de Melo, Celso M.
    Rao, Raghuveer
    Rabbani, Majid
    [J]. SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [4] Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection
    Chen, Zhehuai
    Rosenberg, Andrew
    Zhang, Yu
    Wang, Gary
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    [J]. INTERSPEECH 2020, 2020, : 556 - 560
  • [5] GAN-Based Generation of Synthetic Data for Vehicle Driving Events
    Tamayo-Urgilés, Diego
    Sanchez-Gordon, Sandra
    Valdivieso Caraguay, Ángel Leonardo
    Hernández-Álvarez, Myriam
    [J]. Applied Sciences (Switzerland), 2024, 14 (20):
  • [6] A CONDITIONAL CYCLE EMOTION GAN FOR CROSS CORPUS SPEECH EMOTION RECOGNITION
    Su, Bo-Hao
    Lee, Chi-Chun
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 351 - 357
  • [7] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [8] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [9] Evolutionary feature generation in speech emotion recognition
    Schuller, Bjorn
    Reiter, Stephan
    Rigoll, Gerhard
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 5 - +
  • [10] Antenna Design Using a GAN-Based Synthetic Data Generation Approach
    Noakoasteen, Oameed
    Vijayamohanan, Jayakrishnan
    Gupta, Arjun
    Christodoulou, Christos
    [J]. IEEE OPEN JOURNAL OF ANTENNAS AND PROPAGATION, 2022, 3 : 488 - 494