Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

被引:19
|
作者
Prabhakar, Gudmalwar Ashishkumar [1 ]
Basel, Biplove [1 ]
Dutta, Anirban [1 ]
Rao, Ch. V. Rama [2 ]
机构
[1] Natl Inst Technol, Dept Elect & Commun Engn, Shillong 793003, India
[2] Natl Inst Technol, Dept Elect & Commun Engn, Warangal 506004, India
关键词
Feature extraction; Speech recognition; Correlation; Mel frequency cepstral coefficient; Emotion recognition; Databases; Convolutional neural networks; Phase spectral features; convolutional neural network; emotion recognition; MFCC; NEURAL-NETWORKS; MODEL;
D O I
10.1109/TCE.2023.3236972
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Conventional Speech Emotion Recognition (SER) approaches put more emphasis on extracting magnitude spectrum-based features, such as Mel Frequency Cepstral Coefficients (MFCCs), and Mel spectrogram. However, phase information is ignored due to signal processing difficulties such as the phase wrapping issue. This work develops a multichannel Convolution Neural Network-Bidirectional Long Short Term Memory (CNN-BLSTM) architectures with an attention mechanism for speaker-independent SER by considering phase and magnitude spectrum-based features. The phase-based features are extracted using the Modified Group Delay Function (MODGD). The obtained phase features are combined with MFCC features. The CNN-BLSTM network extract learned representation from magnitude and phase features. The learned representation from MFCCs and MODGD are combined and given as an input to the Support Vector Machine (SVM) for classification. The Deep Canonical Correlation Analysis (DCCA) is used to maximize the correlation between magnitude and phase information to improve the conventional SER system's performance. Here the IEMOCAP database is used for performance analysis. The experimental results show improvement over MFCC features and existing approaches for unimodal SER. In this work, we also developed real-time Web server application for the proposed architecture.
引用
收藏
页码:226 / 235
页数:10
相关论文
共 6 条
  • [1] Speech Emotion Recognition Using Magnitude and Phase Features
    D. Ravi Shankar
    R. B. Manjula
    Rajashekhar C. Biradar
    [J]. SN Computer Science, 5 (5)
  • [2] Improving Speech Emotion Recognition System Using Spectral and Prosodic Features
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 399 - 409
  • [3] Robust Speech Emotion Recognition System Through Novel ER-CNN and Spectral Features
    Zeeshan, Muhammad
    Qayoom, Huma
    Hassan, Farman
    [J]. 2021 4TH INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2021,
  • [4] 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
    Mustaqeem
    Kwon, Soonil
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03): : 4039 - 4059
  • [5] Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features
    Anvarjon, Tursunov
    Mustaqeem
    Kwon, Soonil
    [J]. SENSORS, 2020, 20 (18) : 1 - 16
  • [6] Advanced Fusion-Based Speech Emotion Recognition System Using a Dual-Attention Mechanism with Conv-Caps and Bi-GRU Features
    Maji, Bubai
    Swain, Monorama
    Mustaqeem
    [J]. ELECTRONICS, 2022, 11 (09)