Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

被引:0
|
作者
Wang, Shijun [1 ]
Hemati, Hamed [1 ]
Gudnason, Jon [2 ]
Borth, Damian [1 ]
机构
[1] Univ St Gallen, St Gallen, Switzerland
[2] Reykjavik Univ, Reykjavik, Iceland
来源
关键词
speech emotion recognition; speech augmentation; cross lingual; ADVERSARIAL NETWORKS; STARGAN;
D O I
10.21437/Interspeech.2022-10667
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.
引用
收藏
页码:391 / 395
页数:5
相关论文
共 50 条
  • [41] Applying Generative Adversarial Networks and Vision Transformers in Speech Emotion Recognition
    Heracleous, Panikos
    Fukayama, Satoru
    Ogata, Jun
    Mohammad, Yasser
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13519 LNCS : 67 - 75
  • [42] Effects of Data Augmentations on Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    [J]. SENSORS, 2022, 22 (16)
  • [43] EEG Feature Extraction and Data Augmentation in Emotion Recognition
    Kalashami, Mahsa Pourhosein
    Pedram, Mir Mohsen
    Sadr, Hossein
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [44] An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition
    Ahmed, Md. Rayhan
    Islam, Salekul
    Islam, A. K. M. Muzahidul
    Shatabda, Swakkhar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 218
  • [45] Effect of Data Augmentation, Cross-Validation Methods in Robustness of Explainable Speech Based Emotion Recognition
    Shinde, Ashwini S.
    Patil, Vaishali V.
    [J]. TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1565 - 1574
  • [46] SENet-based speech emotion recognition using synthesis-style transfer data augmentation
    Rajan R.
    Hridya Raj T.V.
    [J]. International Journal of Speech Technology, 2023, 26 (04) : 1017 - 1030
  • [47] Adaptive data augmentation for mandarin automatic speech recognition
    Ding, Kai
    Li, Ruixuan
    Xu, Yuelin
    Du, Xingyue
    Deng, Bin
    [J]. APPLIED INTELLIGENCE, 2024, 54 (07) : 5674 - 5687
  • [48] Data Augmentation Improves Recognition of Foreign Accented Speech
    Fukuda, Takashi
    Fernandez, Raul
    Rosenberg, Andrew
    Thomas, Samuel
    Ramabhadran, Bhuvana
    Sorin, Alexander
    Kurata, Gakuto
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2409 - 2413
  • [49] Investigation of Data Augmentation Techniques for Disordered Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Liu, Shansong
    Yu, Jianwei
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 696 - 700
  • [50] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
    Lee, Shi-wook
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396