Improved Speech Emotion Recognition Using Channel-wise Global Head Pooling (CwGHP)

被引:3
|
作者
Chauhan, Krishna [1 ]
Sharma, Kamalesh Kumar [1 ]
Varma, Tarun [1 ]
机构
[1] Malaviya Natl Inst Technol Jaipur, Elect & Commun Engn Dept, Jaipur 302017, Rajasthan, India
关键词
Speech emotion recognition; Multihead attention; Convolutional neural network; MFCC; Adaptive pooling; SPECTRAL FEATURES; CLASSIFICATION; ATTENTION;
D O I
10.1007/s00034-023-02367-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A multihead attention-based convolutional neural network (CNN) architecture known as channel-wise global head pooling is proposed to improve the classification accuracy of speech emotion recognition. A time-frequency kernel is used in two-dimensional convolution to emphasize both the scales in mel-frequency-cepstral-coefficients. Following the CNN encoder, a multihead attention network is optimized to learn salient discriminating characteristics of audio samples on the three emotional speech datasets, including the interactive emotional dyadic motion capture in English, the Berlin emotional speech dataset in the German language, and Ryerson audio-visual database of emotional speech and song in North American English. The proposed model's robustness is demonstrated in these diverse language datasets. A chunk-level classification approach is utilized for model training with source labels for each segment. While performing the model evaluation, an aggregation of emotions is applied to achieve the emotional sample classification. The classification accuracy is improved to 84.89% and 82.87% unweighted accuracy (UA) and weighted accuracy (WA) on the IEMOCAP dataset. It is the state-of-the-art performance on this speech corpus compared to (79.34% of WA and 77.54% of UA) using only audio modality; the proposed method achieved a UA improvement of more than 7%. Furthermore, it validated the model on two other datasets via a series of experiments that yielded acceptable results. The model is investigated using WA and UA. Additionally, statistical parameters, including precision, recall and F1-score, are also used to estimate the effectiveness of each emotion class.
引用
收藏
页码:5500 / 5522
页数:23
相关论文
共 50 条
  • [41] Speech Emotion Recognition using DWT
    Lalitha, S.
    Mudupu, Anoop
    Nandyala, Bala Visali
    Munagala, Renuka
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 20 - 23
  • [42] A Review on Emotion Recognition using Speech
    Basu, Saikat
    Chakraborty, Jaybrata
    Bag, Arnab
    Aftabuddin, Md.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 109 - 114
  • [43] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [44] Recognition of practical speech emotion using improved shuffled frog leaping algorithm
    ZHANG Xiaodan
    HUANG Chengwei
    ZHAO Li
    ZOU Cairong
    ChineseJournalofAcoustics, 2014, 33 (04) : 441 - 456
  • [46] Accumulating global channel-wise patterns via deformed-bottleneck recalibration for image classification
    Nguyen, Thanh Tuan
    Nguyen, Thanh Phuong
    Nguyen, Vincent
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [47] Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks
    Li, Xu
    Wu, Xixin
    Lu, Hui
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4314 - 4318
  • [48] Multi-branch Channel-wise Enhancement Network for Fine-grained Visual Recognition
    Li, Guangjun
    Wang, Yongxiong
    Zhu, Fengting
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5273 - 5280
  • [49] Channel-Wise Activation Map Pruning using MaxPool for Reducing Memory Accesses
    Cho, Han
    Park, Jongsun
    2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 71 - 72
  • [50] RNN with Improved Temporal Modeling for Speech Emotion Recognition
    Lieskovska, Eva
    Jakubec, Maros
    Jarina, Roman
    2022 32ND INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2022, : 5 - 9