MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引:26
|
作者
Jothimani, S. [1 ]
Premalatha, K. [1 ]
机构
[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India
关键词
Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;
D O I
10.1016/j.chaos.2022.112512
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [22] Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
    Toyoshima, Itsuki
    Okada, Yoshifumi
    Ishimaru, Momoko
    Uchiyama, Ryunosuke
    Tada, Mayu
    SENSORS, 2023, 23 (03)
  • [23] A dynamic-static feature fusion learning network for speech emotion recognition
    Xue, Peiyun
    Gao, Xiang
    Bai, Jing
    Dong, Zhenan
    Wang, Zhiyu
    Xu, Jiangshuai
    NEUROCOMPUTING, 2025, 633
  • [24] A Multi-Feature Convolution Neural Network for Automatic Flower Recognition
    Ran, Juan
    Shi, Yu
    Yu, Jinhao
    Li, Delong
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (15)
  • [25] Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network
    Farooq, Misbah
    Hussain, Fawad
    Baloch, Naveed Khan
    Raja, Fawad Riasat
    Yu, Heejung
    Zikria, Yousaf Bin
    SENSORS, 2020, 20 (21) : 1 - 18
  • [26] High-level Feature Representation using Recurrent Neural Network for Speech Emotion Recognition
    Lee, Jinkyu
    Tashev, Ivan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1537 - 1540
  • [27] Multi-Stream Convolution-Recurrent Neural Networks Based on Attention Mechanism Fusion for Speech Emotion Recognition
    Tao, Huawei
    Geng, Lei
    Shan, Shuai
    Mai, Jingchao
    Fu, Hongliang
    ENTROPY, 2022, 24 (08)
  • [28] A convolutional neural network model of multi-scale feature fusion: MFF-Net
    Yi, Yunyun
    Wang, Jinbao
    Ding, Xingtao
    Li, Chenlong
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2022, 22 (06) : 2217 - 2225
  • [29] Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network
    Kaur, Kamaldeep
    Singh, Parminder
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [30] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
    Rashid Jahangir
    Ying Wah Teh
    Ghulam Mujtaba
    Roobaea Alroobaea
    Zahid Hussain Shaikh
    Ihsan Ali
    Machine Vision and Applications, 2022, 33