Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

被引:0
|
作者
Wang, Shijun [1 ]
Hemati, Hamed [1 ]
Gudnason, Jon [2 ]
Borth, Damian [1 ]
机构
[1] Univ St Gallen, St Gallen, Switzerland
[2] Reykjavik Univ, Reykjavik, Iceland
来源
关键词
speech emotion recognition; speech augmentation; cross lingual; ADVERSARIAL NETWORKS; STARGAN;
D O I
10.21437/Interspeech.2022-10667
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) is crucial for humancomputer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.
引用
收藏
页码:391 / 395
页数:5
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Data Augmentation
    Kapoor, Tanisha
    Ganguly, Arnaja
    Rajeswari, D.
    [J]. 2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [2] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    [J]. International Journal of Speech Technology, 2022, 25 : 783 - 792
  • [3] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [4] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Shilandari, Arash
    Marvi, Hossein
    Khosravi, Hossein
    Wang, Wenwu
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (07) : 1955 - 1962
  • [5] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Arash Shilandari
    Hossein Marvi
    Hossein Khosravi
    Wenwu Wang
    [J]. Signal, Image and Video Processing, 2022, 16 : 1955 - 1962
  • [6] Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation
    Latif, Siddique
    Shahid, Abdullah
    Qadir, Junaid
    [J]. APPLIED ACOUSTICS, 2023, 210
  • [7] Data Augmentation using GANs for Speech Emotion Recognition
    Chatziagapi, Aggelina
    Paraskevopoulos, Georgios
    Sgouropoulos, Dimitris
    Pantazopoulos, Georgios
    Nikandrou, Malvina
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Potamianos, Alexandros
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2019, 2019, : 171 - 175
  • [8] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [9] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
  • [10] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184