A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION

被引:3
|
作者
Sadok, Samir [1 ]
Leglaive, Simon [1 ]
Seguier, Renaud [1 ]
机构
[1] CentraleSupelec, IETR UMR CNRS 6164, Gif Sur Yvette, France
关键词
Self-supervised learning; masked autoencoder; vector-quantized variational autoencoder; speech emotion recognition; FEATURES;
D O I
10.1109/ICASSPW59220.2023.10193151
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent years have seen remarkable progress in speech emotion recognition (SER), thanks to advances in deep learning techniques. However, the limited availability of labeled data remains a significant challenge in the field. Self-supervised learning has recently emerged as a promising solution to address this challenge. In this paper, we propose the vector quantized masked autoencoder for speech (VQ-MAE-S), a self-supervised model that is fine-tuned to recognize emotions from speech signals. The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector quantized variational autoencoder. Experimental results show that the proposed VQ-MAE-S model, pre-trained on the VoxCeleb2 dataset and fine-tuned on emotional speech data, outperforms an MAE working on the raw spectrogram representation and other state-of-the-art methods in SER.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Application of Vector Quantization in Emotion Recognition from Human Speech
    Khanna, Preeti
    Kumar, M. Sasi
    INFORMATION INTELLIGENCE, SYSTEMS, TECHNOLOGY AND MANAGEMENT, 2011, 141 : 118 - +
  • [22] Speech emotion recognition based on optimized support vector machine
    Yu, Bo
    Li, Haifeng
    Fang, Chunying
    Journal of Software, 2012, 7 (12) : 2726 - 2733
  • [23] Bagged support vector machines for emotion recognition from speech
    Bhavan, Anjali
    Chauhan, Pankaj
    Hitkul
    Shah, Rajiv Ratn
    KNOWLEDGE-BASED SYSTEMS, 2019, 184
  • [24] HiCMAE: Hierarchical Contrastive Masked Autoencoder for self-supervised Audio-Visual Emotion Recognition
    Sun, Licai
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    Information Fusion, 2024, 108
  • [25] HiCMAE: Hierarchical Contrastive Masked Autoencoder for self-supervised Audio-Visual Emotion Recognition
    Sun, Licai
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    INFORMATION FUSION, 2024, 108
  • [26] Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech
    Almotlak, Hussam
    Weber, Cornelius
    Qu, Leyuan
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 529 - 540
  • [27] Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features
    Bastanfard, Azam
    Abbasian, Alireza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36413 - 36430
  • [28] Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features
    Azam Bastanfard
    Alireza Abbasian
    Multimedia Tools and Applications, 2023, 82 : 36413 - 36430
  • [29] Using Denoising Autoencoder for Emotion Recognition
    Xia, Rui
    Liu, Yang
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2885 - 2888
  • [30] Vector-Quantized Autoencoder With Copula for Collaborative Filtering
    Wang, Guanyu
    Zhong, Ting
    Xu, Xovee
    Zhang, Kunpeng
    Zhou, Fan
    Wang, Yong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3458 - 3462