Deep Convolutional Neural Networks for Feature Extraction in Speech Emotion Recognition

被引:2
|
作者
Heracleous, Panikos [1 ]
Mohammad, Yasser [2 ]
Yoneyama, Akio [1 ]
机构
[1] KDDI Res Inc, 2-1-15 Ohara, Fujimino, Saitama 3568502, Japan
[2] AIST, Artificial Intelligence Res Ctr, 2-4-7 Aomi,Koto Ku, Tokyo 1350064, Japan
关键词
Speech emotion recognition; Deep convolutional neural networks; Informative features; i-vectors; Extremely randomized trees; TRANSFORMATION;
D O I
10.1007/978-3-030-22643-5_9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Speech emotion recognition is a task designed to automatically identify human emotions in spoken utterances. The current study focuses on speech emotion recognition based on deep convolutional neural networks (DCNNs) and extremely randomized trees. Specifically, we propose a method based on DCNN, which extracts informative features from the speech signal, and those features are then used by an extremely randomized trees classifier for emotion recognition. The CNNs are a special variant of conventional feed-forward deep neural networks (DNNs), and have been used in many speech applications. Another method is also proposed which integrates DCNN with i-vectors for emotion recognition. The proposed methods were evaluated using the state-of-the-art English IEMOCAP and FAU Aibo German emotional corpora for the recognition of four and five emotions, respectively. When using the IEMOCAP English corpus and DCNN with extremely randomized trees, a 63.9% unweighted average recall (UAR) was obtained. In the case of using the German children's Aibo corpus, a 61.8% UAR was achieved. These results are very promising showing the effectiveness of the proposed methods in speech emotion recognition. The proposed methods were compared with a baseline approach based on support vector machines (SVM), and they showed superior performance.
引用
收藏
页码:117 / 132
页数:16
相关论文
共 50 条
  • [1] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [2] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
    Dossou, Bonaventure F. P.
    Gbenou, Yeno K. S.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531
  • [3] Enhancing Speech Emotion Recognition Using Deep Convolutional Neural Networks
    Islam, M. M. Manjurul
    Kabir, Md Alamgir
    Sheikh, Alamin
    Saiduzzaman, Muhammad
    Hafid, Abdelakram
    Abdullah, Saad
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 95 - 100
  • [4] Improvement on Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Niu, Yafeng
    Zou, Dongsheng
    Niu, Yadong
    He, Zhongshi
    Tan, Hua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 13 - 18
  • [5] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [6] Feature Extraction with Handcrafted Methods and Convolutional Neural Networks for Facial Emotion Recognition
    Tsalera, Eleni
    Papadakis, Andreas
    Samarakou, Maria
    Voyiatzis, Ioannis
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [7] An Experimental Study of Speech Emotion Recognition Based on Deep Convolutional Neural Networks
    Zheng, W. Q.
    Yu, J. S.
    Zou, Y. X.
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 827 - 831
  • [8] FEATURE EXTRACTION USING MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS FOR VISUAL SPEECH RECOGNITION
    Tatulli, Eric
    Hueber, Thomas
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2971 - 2975
  • [9] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
  • [10] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24