Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network

被引：5

作者：

Baruah, Murchana ^{[1
]}

Banerjee, Bonny

机构：

[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

Speech emotion recognition; recognition by generation; variational RNN; MFCC; attention; active inference; predictive coding; FEATURES;

D O I：

10.21437/Interspeech.2022-753

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The last decade has seen an exponential rise in the number of attention-based models for speech emotion recognition (SER). Most of these models use a spectrogram as the input speech representation and the CNN or RNN or convolutional RNN as the key machine learning (ML) component, and learn feature weights to implement attention. We propose an attention-based model for SER that uses MFCC as the input speech representation and a variational RNN (VRNN) as the key ML component. Since the MFCC is of lower dimension than a spectrogram, the model is size- and data-efficient. The VRNN has been used for problems in vision but rarely for SER. Our model is predictive in nature. At each instant, it infers the emotion class and generates the next observation, computes the generation error, and selectively samples (attends to) the locations of high error. Thus, attention emerges in our model, and does not require learning feature weights. This simple model provides interesting insights when evaluated for SER on benchmark datasets. The model can operate on variable length and infinite duration audio files. This work is the first to explore simultaneous generation and recognition for SER, where the generation capability is necessary for efficient recognition.

引用

页码：4710 / 4714

页数：5

共 50 条

[41] Segment-Based Speech Emotion Recognition Using Recurrent Neural Networks
Tzinis, Efthymios
Potamianos, Alexandros
[J]. 2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 190 - 195
[42] Reconstruction of reservoir rock using attention-based convolutional recurrent neural network
Kumar, Indrajeet
Singh, Anugrah
[J]. Applied Computing and Geosciences, 2024, 24
[43] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
Sun, Congshan
Li, Haifeng
Ma, Lin
[J]. FRONTIERS IN PSYCHOLOGY, 2023, 13
[44] An Attention-based Recurrent Convolutional Network for Vehicle Taillight Recognition
Lee, Kuan-Hui
Tagawa, Takaaki
Pan, Jia-En M.
Gaidon, Adrien
Douillard, Bertrand
[J]. 2019 30TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV19), 2019, : 2365 - 2370
[45] Attention gated tensor neural network architectures for speech emotion recognition
Pandey, Sandeep Kumar
Shekhawat, Hanumant Singh
Prasanna, S. R. M.
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
[46] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
Zhang, Zhihao
Wang, Kunxia
[J]. 2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
[47] Attention-Based Models for Speech Recognition
Chorowski, Jan
Bahdanau, Dzmitry
Serdyuk, Dmitriy
Cho, Kyunghyun
Bengio, Yoshua
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[48] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
Lim, Wootaek
Jang, Daeyoung
Lee, Taejin
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[49] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
Chen, Qiupu
Huang, Guimin
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[50] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
Zhao, Huan
Gao, Yingxue
Xiao, Yufeng
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130

← 1 2 3 4 5 →