A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION

被引:3
|
作者
Sadok, Samir [1 ]
Leglaive, Simon [1 ]
Seguier, Renaud [1 ]
机构
[1] CentraleSupelec, IETR UMR CNRS 6164, Gif Sur Yvette, France
关键词
Self-supervised learning; masked autoencoder; vector-quantized variational autoencoder; speech emotion recognition; FEATURES;
D O I
10.1109/ICASSPW59220.2023.10193151
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent years have seen remarkable progress in speech emotion recognition (SER), thanks to advances in deep learning techniques. However, the limited availability of labeled data remains a significant challenge in the field. Self-supervised learning has recently emerged as a promising solution to address this challenge. In this paper, we propose the vector quantized masked autoencoder for speech (VQ-MAE-S), a self-supervised model that is fine-tuned to recognize emotions from speech signals. The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector quantized variational autoencoder. Experimental results show that the proposed VQ-MAE-S model, pre-trained on the VoxCeleb2 dataset and fine-tuned on emotional speech data, outperforms an MAE working on the raw spectrogram representation and other state-of-the-art methods in SER.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Speech emotion recognition using Gaussian mixture vector autoregressive models
    El Ayadi, Moataz M. H.
    Kamel, Mohamed S.
    Karray, Fakhri
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 957 - +
  • [32] SPEECH EMOTION RECOGNITION WITH I-VECTOR FEATURE AND RNN MODEL
    Zhang, Teng
    Wu, Ji
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 524 - 528
  • [33] An Optimization Method for Support Vector Machine Applied to Speech Emotion Recognition
    Zhang, Wanli
    Li, Guoxin
    Gao, Wei
    2015 4TH INTERNATIONAL CONFERENCE ON MECHANICS AND CONTROL ENGINEERING (ICMCE 2015), 2015, 35
  • [34] An i-vector GPLDA System for Speech based Emotion Recognition
    Gamage, Kalani Wataraka
    Sethu, Vidhyasaharan
    Phu Ngoc Le
    Ambikairajah, Eliathamby
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 289 - 292
  • [35] Feature Vector Classification based Speech Emotion Recognition for Service Robots
    Park, Jeong-Sik
    Kim, Ji-Hwan
    Oh, Yung-Hwan
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2009, 55 (03) : 1590 - 1596
  • [36] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [37] Set-pMAE: spatial-spEctral-temporal based parallel masked autoEncoder for EEG emotion recognition
    Pan, Chenyu
    Lu, Huimin
    Lin, Chenglin
    Zhong, Zeyi
    Liu, Bing
    COGNITIVE NEURODYNAMICS, 2024, : 3757 - 3773
  • [38] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
  • [39] Disentangled Variational Autoencoder for Emotion Recognition in Conversations
    Yang, Kailai
    Zhang, Tianlin
    Ananiadou, Sophia
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 508 - 518
  • [40] Predictive Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,