Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引:30
|
作者
Pawar, Manju D. [1 ]
Kokate, Rajendra D. [2 ]
机构
[1] Maharashtra Inst Technol, Aurangabad, Maharashtra, India
[2] Govt Coll Engn, Jalgaon, Maharashtra, India
关键词
Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch; EXTRACTION; FEATURES; MFCC;
D O I
10.1007/s11042-020-10329-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).
引用
收藏
页码:15563 / 15587
页数:25
相关论文
共 50 条
  • [1] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Manju D. Pawar
    Rajendra D. Kokate
    [J]. Multimedia Tools and Applications, 2021, 80 : 15563 - 15587
  • [2] Emotion Recognition from Speech Signal Using Mel-Frequency Cepstral Coefficients
    Korkmaz, Onur Erdem
    Atasoy, Ayten
    [J]. 2015 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2015, : 1254 - 1257
  • [3] Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients
    Palo, Hemanta Kumar
    Chandra, Mahesh
    Mohanty, Mihir Narayan
    [J]. ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 491 - 498
  • [4] Speaker Recognition Using Mel-Frequency Cepstrum Coefficients and Sum Square Error
    Charisma, Atik
    Hidayat, M. Reza
    Zainal, Yuda Bakti
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2017, : 160 - 163
  • [5] ACOUSTIC PORNOGRAPHY RECOGNITION USING FUSED PITCH AND MEL-FREQUENCY CEPSTRUM COEFFICIENTS
    Banaeeyan, Rasoul
    Karim, Hezerul Abdul
    Lye, Haris
    Fauzi, Mohamad Faizal Ahmad
    Mansor, Sarina
    See, John
    [J]. INTERNATIONAL JOURNAL OF TECHNOLOGY, 2019, 10 (07) : 1335 - 1343
  • [6] Automatic speech recognition using Mel-frequency cepstrum coefficient (MFCC) and vector quantization (VQ) techniques for continuous speech
    Verma, Amit
    Kumar, Amit
    Kaur, Iqbaldeep
    [J]. INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2018, 5 (04): : 73 - 78
  • [7] Seal call recognition based on general regression neural network using Mel-frequency cepstrum coefficient features
    Qihai Yao
    Yong Wang
    Yixin Yang
    Yang Shi
    [J]. EURASIP Journal on Advances in Signal Processing, 2023
  • [8] Seal call recognition based on general regression neural network using Mel-frequency cepstrum coefficient features
    Yao, Qihai
    Wang, Yong
    Yang, Yixin
    Shi, Yang
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2023, 2023 (01)
  • [9] RECOGNITION OF NON-SPEECH SOUNDS USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS AND DYNAMIC TIME WARPING METHOD
    Disken, Gokay
    Ibrikci, Turgay
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 144 - 147
  • [10] Boosting speech/non-speech classification using averaged Mel-frequency Cepstrum Coefficients features
    Xiong, ZY
    Huang, TS
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 573 - 580