MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引:26
|
作者
Jothimani, S. [1 ]
Premalatha, K. [1 ]
机构
[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India
关键词
Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;
D O I
10.1016/j.chaos.2022.112512
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [2] A Feature Fusion Model with Data Augmentation for Speech Emotion Recognition
    Tu, Zhongwen
    Liu, Bin
    Zhao, Wei
    Yan, Raoxin
    Zou, Yang
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [3] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [4] A multi-dilated convolution network for speech emotion recognition
    Madanian, Samaneh
    Adeleye, Olayinka
    Templeton, John Michael
    Chen, Talen
    Poellabauer, Christian
    Zhang, Enshi
    Schneider, Sandra L.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [5] EEG emotion recognition approach using multi-scale convolution and feature fusion
    Zhang, Yong
    Shan, Qingguo
    Chen, Wenyun
    Liu, Wenzhe
    VISUAL COMPUTER, 2024, : 4157 - 4169
  • [6] Speech emotion recognition based on multi‐feature and multi‐lingual fusion
    Chunyi Wang
    Ying Ren
    Na Zhang
    Fuwei Cui
    Shiying Luo
    Multimedia Tools and Applications, 2022, 81 : 4897 - 4907
  • [7] Convolution Neural Network with Multi-Resolution Feature Fusion for Facial Expression Recognition
    He Zhichao
    Zhao Longzhang
    Chen Chuang
    LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (07)
  • [8] Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion
    Al-onazi, Badriyya B.
    Nauman, Muhammad Asif
    Jahangir, Rashid
    Malik, Muhmmad Mohsin
    Alkhammash, Eman H.
    Elshewey, Ahmed M.
    APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [9] Speech Emotion Recognition Based on Multi Acoustic Feature Fusion
    Xiang, Shanshan
    Anwer, Sadiyagul
    Yilahun, Hankiz
    Hamdulla, Askar
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 338 - 346
  • [10] Palm Vein Recognition Using Convolution Neural Network Based on Feature Fusion with HOG Feature
    Kuang, Hailan
    Zhong, Zhenhua
    Liu, Xinhua
    Ma, Xiaolin
    2020 5TH INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA 2020), 2020, : 295 - 299