Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引：30

作者：

Pawar, Manju D. ^{[1
]}

Kokate, Rajendra D. ^{[2
]}

机构：

[1] Maharashtra Inst Technol, Aurangabad, Maharashtra, India

[2] Govt Coll Engn, Jalgaon, Maharashtra, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 10期

关键词：

Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch; EXTRACTION; FEATURES; MFCC;

D O I：

10.1007/s11042-020-10329-2

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).

引用

页码：15563 / 15587

页数：25

共 50 条

[1] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
Manju D. Pawar
Rajendra D. Kokate
[J]. Multimedia Tools and Applications, 2021, 80 : 15563 - 15587
[2] Emotion Recognition from Speech Signal Using Mel-Frequency Cepstral Coefficients
Korkmaz, Onur Erdem
Atasoy, Ayten
[J]. 2015 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2015, : 1254 - 1257
[3] Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients
Palo, Hemanta Kumar
Chandra, Mahesh
Mohanty, Mihir Narayan
[J]. ADVANCES IN SYSTEMS, CONTROL AND AUTOMATION, 2018, 442 : 491 - 498
[4] Speaker Recognition Using Mel-Frequency Cepstrum Coefficients and Sum Square Error
Charisma, Atik
Hidayat, M. Reza
Zainal, Yuda Bakti
[J]. 2017 3RD INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2017, : 160 - 163
[5] ACOUSTIC PORNOGRAPHY RECOGNITION USING FUSED PITCH AND MEL-FREQUENCY CEPSTRUM COEFFICIENTS
Banaeeyan, Rasoul
Karim, Hezerul Abdul
Lye, Haris
Fauzi, Mohamad Faizal Ahmad
Mansor, Sarina
See, John
[J]. INTERNATIONAL JOURNAL OF TECHNOLOGY, 2019, 10 (07) : 1335 - 1343
[6] Automatic speech recognition using Mel-frequency cepstrum coefficient (MFCC) and vector quantization (VQ) techniques for continuous speech
Verma, Amit
Kumar, Amit
Kaur, Iqbaldeep
[J]. INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2018, 5 (04): : 73 - 78
[7] Seal call recognition based on general regression neural network using Mel-frequency cepstrum coefficient features
Qihai Yao
Yong Wang
Yixin Yang
Yang Shi
[J]. EURASIP Journal on Advances in Signal Processing, 2023
[8] Seal call recognition based on general regression neural network using Mel-frequency cepstrum coefficient features
Yao, Qihai
Wang, Yong
Yang, Yixin
Shi, Yang
[J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2023, 2023 (01)
[9] RECOGNITION OF NON-SPEECH SOUNDS USING MEL-FREQUENCY CEPSTRUM COEFFICIENTS AND DYNAMIC TIME WARPING METHOD
Disken, Gokay
Ibrikci, Turgay
[J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 144 - 147
[10] Boosting speech/non-speech classification using averaged Mel-frequency Cepstrum Coefficients features
Xiong, ZY
Huang, TS
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 573 - 580

← 1 2 3 4 5 →