MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network

被引:26
|
作者
Jothimani, S. [1 ]
Premalatha, K. [1 ]
机构
[1] Bannari Amman Inst Technol, Dept Comp Sci & Engn, Sathyamangalam 638401, India
关键词
Augmentation; Contrastive loss; MFCC; RMS; Speech emotion recognition; ZCR; ACCURACY;
D O I
10.1016/j.chaos.2022.112512
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The Speech Emotion Recognition (SER) is a complex task because of the feature selections that reflect the emotion from the human speech. The SER plays a vital role and is very challenging in Human-Computer Interaction (HCI). Traditional methods provide inconsistent feature extraction for emotion recognition. The primary motive of this paper is to improve the accuracy of the classification of eight emotions from the human voice. The proposed MFF-SAug research, Enhance the emotion prediction from the speech by Noise Removal, White Noise Injection, and Pitch Tuning. On pre-processed speech signals, the feature extraction techniques Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS) are applied and combined to achieve substantial performance used for emotion recognition. The augmentation applies to the raw speech for a contrastive loss that maximizes agreement between differently augmented samples in the latent space and reconstructs the loss of input representation for better accuracy prediction. A state-of-the-art Convolution Neural Network (CNN) is proposed for enhanced speech representation learning and voice emotion classification. Further, this MFF-SAug method is compared with the CNN + LSTM model. The experi-mental analysis was carried out using the RAVDESS, CREMA, SAVEE, and TESS datasets. Thus, the classifier achieved a robust representation for speech emotion recognition with an accuracy of 92.6 %, 89.9, 84.9 %, and 99.6 % for RAVDESS, CREMA, SAVEE, and TESS datasets, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion
    Jahangir, Rashid
    Teh, Ying Wah
    Mujtaba, Ghulam
    Alroobaea, Roobaea
    Shaikh, Zahid Hussain
    Ali, Ihsan
    MACHINE VISION AND APPLICATIONS, 2022, 33 (03)
  • [32] Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network
    Li, Feng
    Luo, Jiusong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 211 - 221
  • [33] Speech Emotion Recognition Using Multi-granularity Feature Fusion Through Auditory Cognitive Mechanism
    Xu, Cong
    Li, Haifeng
    Bo, Hongjian
    Ma, Lin
    COGNITIVE COMPUTING - ICCC 2019, 2019, 11518 : 117 - 131
  • [34] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [35] Speech Emotion Recognition Using Neural Network and Wavelet Features
    Roy, Tanmoy
    Marwala, Tshilidzi
    Chakraverty, S.
    RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
  • [36] Speech Emotion Recognition using Context-Aware Dilated Convolution Network
    Kakuba, Samuel
    Han, Dong Seog
    2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA, 2022, : 601 - 604
  • [37] Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion
    Yu, Lingli
    Xu, Fengjun
    Qu, Yundong
    Zhou, Kaijun
    APPLIED ACOUSTICS, 2024, 216
  • [38] A Video Expression Recognition Method Based on Multi-mode Convolution Neural Network and Multiplicative Feature Fusion
    Ren, Qun
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (03): : 556 - 570
  • [39] Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm
    Zvarevashe, Kudakwashe
    Olugbara, Oludayo O.
    INTELLIGENT DATA ANALYSIS, 2020, 24 (05) : 1065 - 1086
  • [40] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Manju D. Pawar
    Rajendra D. Kokate
    Multimedia Tools and Applications, 2021, 80 : 15563 - 15587