Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network

被引:5
|
作者
Baruah, Murchana [1 ]
Banerjee, Bonny
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
来源
关键词
Speech emotion recognition; recognition by generation; variational RNN; MFCC; attention; active inference; predictive coding; FEATURES;
D O I
10.21437/Interspeech.2022-753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The last decade has seen an exponential rise in the number of attention-based models for speech emotion recognition (SER). Most of these models use a spectrogram as the input speech representation and the CNN or RNN or convolutional RNN as the key machine learning (ML) component, and learn feature weights to implement attention. We propose an attention-based model for SER that uses MFCC as the input speech representation and a variational RNN (VRNN) as the key ML component. Since the MFCC is of lower dimension than a spectrogram, the model is size- and data-efficient. The VRNN has been used for problems in vision but rarely for SER. Our model is predictive in nature. At each instant, it infers the emotion class and generates the next observation, computes the generation error, and selectively samples (attends to) the locations of high error. Thus, attention emerges in our model, and does not require learning feature weights. This simple model provides interesting insights when evaluated for SER on benchmark datasets. The model can operate on variable length and infinite duration audio files. This work is the first to explore simultaneous generation and recognition for SER, where the generation capability is necessary for efficient recognition.
引用
收藏
页码:4710 / 4714
页数:5
相关论文
共 50 条
  • [1] Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition
    Zhao, Ziping
    Bao, Zhongtian
    Zhao, Yiqin
    Zhang, Zixing
    Cummins, Nicholas
    Ren, Zhao
    Schuller, Bjorn
    [J]. IEEE ACCESS, 2019, 7 : 97515 - 97525
  • [2] Ascertaining Speech Emotion using Attention-based Convolutional Neural Network Framework
    Arya, Ashima
    Arya, Vaishali
    Kohli, Neha
    Sukhija, Namrata
    Ibrahim, Ashraf Osman
    Bharany, Salil
    Binzagr, Faisal
    Muchtar, Farkhana Binti
    Mamoun, Mohamed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 614 - 622
  • [3] EEG emotion recognition using attention-based convolutional transformer neural network
    Gong, Linlin
    Li, Mingyang
    Zhang, Tao
    Chen, Wanzhong
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 84
  • [4] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
    Ngoc-Huynh Ho
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    [J]. IEEE ACCESS, 2020, 8 : 61672 - 61686
  • [5] AHRNN: Attention-Based Hybrid Robust Neural Network for emotion recognition
    Xu, Ke
    Liu, Bin
    Tao, Jianhua
    Lv, Zhao
    Fan, Cunhang
    Song, Leichao
    [J]. COGNITIVE COMPUTATION AND SYSTEMS, 2022, 4 (01) : 85 - 95
  • [6] Attention-based 3D convolutional recurrent neural network model for multimodal emotion recognition
    Du, Yiming
    Li, Penghai
    Cheng, Longlong
    Zhang, Xuanwei
    Li, Mingji
    Li, Fengzhou
    [J]. FRONTIERS IN NEUROSCIENCE, 2024, 17
  • [7] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [8] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [9] Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
    Meng, Hao
    Yan, Tianhao
    Wei, Hongwei
    Ji, Xun
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2021, 69 (01)
  • [10] A NOVEL ATTENTION-BASED GATED RECURRENT UNIT AND ITS EFFICACY IN SPEECH EMOTION RECOGNITION
    Rajamani, Srividya Tirunellai
    Rajamani, Kumar T.
    Mallol-Ragolta, Adria
    Liu, Shuo
    Schuller, Bjoern
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6294 - 6298