MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引:26
|
作者
Jothimani, S. [1 ]
Premalatha, K. [1 ]
机构
[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India
关键词
Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;
D O I
10.1016/j.chaos.2022.112512
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Mansor, Hasmah
    Kartiwi, Mira
    Ismail, Nanang
    PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
  • [42] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Pawar, Manju D.
    Kokate, Rajendra D.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
  • [43] Convolution Neural Network Image Defogging Based on Multi-Feature Fusion
    Xu Yan
    Sun Meishuang
    LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (03)
  • [44] Emotion Recognition Using Multi-parameter Speech Feature Classification
    Poorna, S. S.
    Jeevitha, C. Y.
    Nair, Shyama Jayan
    Santhosh, Sini
    Nair, G. J.
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 217 - 222
  • [45] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [46] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
    Khan, Waleed Akram
    ul Qudous, Hamad
    Farhan, Asma Ahmad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
  • [47] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [48] Graph-Based Multi-Feature Fusion Method for Speech Emotion Recognition
    Liu, Xueyu
    Lin, Jie
    Wang, Chao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (16)
  • [49] MULTI-QUARTZNET: MULTI-RESOLUTION CONVOLUTION FOR SPEECH RECOGNITION WITH MULTI-LAYER FEATURE FUSION
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Jiang, Guilin
    Xiao, Jing
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 82 - 88
  • [50] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707