Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition

被引：2

作者：

Heracleous, Panikos ^{[1
]}

Mohammad, Yasser ^{[2
]}

Yoneyama, Akio ^{[1
]}

机构：

[1] KDDI Res Inc, 2-1-15 Ohara, Fujimino, Saitama 3568502, Japan

[2] AIST, Artificial Intelligence Res Ctr, 2-4-7 Aomi,Koto Ku, Tokyo 1350064, Japan

来源：

HUMAN-COMPUTER INTERACTION. RECOGNITION AND INTERACTION TECHNOLOGIES, HCI 2019, PT II | 2019年 / 11567卷

关键词：

Speech emotion recognition; Deep convolutional neural networks; Informative features; i-vectors; Extremely randomized trees; TRANSFORMATION;

D O I：

10.1007/978-3-030-22643-5_9

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech emotion recognition is a task designed to automatically identify human emotions in spoken utterances. The current study focuses on speech emotion recognition based on deep convolutional neural networks (DCNNs) and extremely randomized trees. Specifically, we propose a method based on DCNN, which extracts informative features from the speech signal, and those features are then used by an extremely randomized trees classifier for emotion recognition. The CNNs are a special variant of conventional feed-forward deep neural networks (DNNs), and have been used in many speech applications. Another method is also proposed which integrates DCNN with i-vectors for emotion recognition. The proposed methods were evaluated using the state-of-the-art English IEMOCAP and FAU Aibo German emotional corpora for the recognition of four and five emotions, respectively. When using the IEMOCAP English corpus and DCNN with extremely randomized trees, a 63.9% unweighted average recall (UAR) was obtained. In the case of using the German children's Aibo corpus, a 61.8% UAR was achieved. These results are very promising showing the effectiveness of the proposed methods in speech emotion recognition. The proposed methods were compared with a baseline approach based on support vector machines (SVM), and they showed superior performance.

引用

页码：117 / 132

页数：16

共 50 条

[1] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[2] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Dossou, Bonaventure F. P.
Gbenou, Yeno K. S.
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
[3] Enhancing Speech Emotion Recognition Using Deep Convolutional Neural Networks
Islam, M. M. Manjurul
Kabir, Md Alamgir
Sheikh, Alamin
Saiduzzaman, Muhammad
Hafid, Abdelakram
Abdullah, Saad
PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 95 - 100
[4] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Niu, Yafeng
Zou, Dongsheng
Niu, Yadong
He, Zhongshi
Tan, Hua
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
[5] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
Wani, Taiba Majid
Gunawan, Teddy Surya
Qadri, Syed Asif Ahmad
Mansor, Hasmah
Kartiwi, Mira
Ismail, Nanang
PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
[6] Feature Extraction with Handcrafted Methods and Convolutional Neural Networks for Facial Emotion Recognition
Tsalera, Eleni
Papadakis, Andreas
Samarakou, Maria
Voyiatzis, Ioannis
APPLIED SCIENCES-BASEL, 2022, 12 (17):
[7] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
Zheng, W. Q.
Yu, J. S.
Zou, Y. X.
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
[8] FEATURE EXTRACTION USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL SPEECH RECOGNITION
Tatulli, Eric
Hueber, Thomas
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2971 - 2975
[9] Continuous Speech Emotion Recognition with Convolutional Neural Networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
[10] Continuous speech emotion recognition with convolutional neural networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24

← 1 2 3 4 5 →