MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引：26

作者：

Jothimani, S. ^{[1
]}

Premalatha, K. ^{[1
]}

机构：

[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India

来源：

CHAOS SOLITONS & FRACTALS | 2022年 / 162卷

关键词：

Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;

D O I：

10.1016/j.chaos.2022.112512

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.

引用

页数：18

共 50 条

[41] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
Wani, Taiba Majid
Gunawan, Teddy Surya
Qadri, Syed Asif Ahmad
Mansor, Hasmah
Kartiwi, Mira
Ismail, Nanang
PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
[42] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Pawar, Manju D.
Kokate, Rajendra D.
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
[43] Convolution Neural Network Image Defogging Based on Multi-Feature Fusion
Xu Yan
Sun Meishuang
LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (03)
[44] Emotion Recognition Using Multi-parameter Speech Feature Classification
Poorna, S. S.
Jeevitha, C. Y.
Nair, Shyama Jayan
Santhosh, Sini
Nair, G. J.
2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 217 - 222
[45] Speech emotion recognition using multimodal feature fusion with machine learning approach
Sandeep Kumar Panda
Ajay Kumar Jena
Mohit Ranjan Panda
Susmita Panda
Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
[46] Speech emotion recognition using feature fusion: a hybrid approach to deep learning
Khan, Waleed Akram
ul Qudous, Hamad
Farhan, Asma Ahmad
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (31) : 75557 - 75584
[47] Speech emotion recognition using multimodal feature fusion with machine learning approach
Panda, Sandeep Kumar
Jena, Ajay Kumar
Panda, Mohit Ranjan
Panda, Susmita
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
[48] Graph-Based Multi-Feature Fusion Method for Speech Emotion Recognition
Liu, Xueyu
Lin, Jie
Wang, Chao
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (16)
[49] MULTI-QUARTZNET: MULTI-RESOLUTION CONVOLUTION FOR SPEECH RECOGNITION WITH MULTI-LAYER FEATURE FUSION
Luo, Jian
Wang, Jianzong
Cheng, Ning
Jiang, Guilin
Xiao, Jing
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 82 - 88
[50] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
Kim, Nam Kyun
Lee, Jiwon
Ha, Hun Kyu
Lee, Geon Woo
Lee, Jung Hyuk
Kim, Hong Kook
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707

← 1 2 3 4 5 →