A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION

被引:3
|
作者
Sadok, Samir [1 ]
Leglaive, Simon [1 ]
Seguier, Renaud [1 ]
机构
[1] CentraleSupelec, IETR UMR CNRS 6164, Gif Sur Yvette, France
关键词
Self-supervised learning; masked autoencoder; vector-quantized variational autoencoder; speech emotion recognition; FEATURES;
D O I
10.1109/ICASSPW59220.2023.10193151
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent years have seen remarkable progress in speech emotion recognition (SER), thanks to advances in deep learning techniques. However, the limited availability of labeled data remains a significant challenge in the field. Self-supervised learning has recently emerged as a promising solution to address this challenge. In this paper, we propose the vector quantized masked autoencoder for speech (VQ-MAE-S), a self-supervised model that is fine-tuned to recognize emotions from speech signals. The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector quantized variational autoencoder. Experimental results show that the proposed VQ-MAE-S model, pre-trained on the VoxCeleb2 dataset and fine-tuned on emotional speech data, outperforms an MAE working on the raw spectrogram representation and other state-of-the-art methods in SER.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241
  • [2] Autoencoder with emotion embedding for speech emotion recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE Access, 2021, 9 : 51231 - 51241
  • [3] Speech Emotion Recognition 'in the wild' Using an Autoencoder
    Dissanayake, Vipula
    Zhang, Haimo
    Billinghurst, Mark
    Nanayakkara, Suranga
    INTERSPEECH 2020, 2020, : 526 - 530
  • [4] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [5] Two-stream Emotion-embedded Autoencoder for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    2021 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS), 2021, : 969 - 974
  • [6] Performance Evaluation of Deep Autoencoder Network for Speech Emotion Recognition
    AndleebSiddiqui, Maria
    Hussain, Wajahat
    Ali, Syed Abbas
    Danish-ur-Rehman
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 606 - 611
  • [7] Unsupervised Feature Learning for Speech Emotion Recognition Based on Autoencoder
    Ying, Yangwei
    Tu, Yuanwu
    Zhou, Hong
    ELECTRONICS, 2021, 10 (17)
  • [8] SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Yang, Tsung-Hsien
    Su, Ming-Hsiang
    Chou, Jia-Hui
    2016 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2018, : 1 - 4
  • [9] Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
    Tuan Vu Ho
    Quoc Huy Nguyen
    Akagi, Masato
    Unoki, Masashi
    INTERSPEECH 2022, 2022, : 176 - 180
  • [10] An autoencoder-based feature level fusion for speech emotion recognition
    Shixin, Peng
    Kai, Chen
    Tian, Tian
    Jingying, Chen
    Digital Communications and Networks, 2024, 10 (05) : 1341 - 1351