Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引：1

作者：

Manju D. Pawar

Rajendra D. Kokate

机构：

[1] Maharashtra Institute of Technology,

[2] Government College of Engineering,undefined

来源：

Multimedia Tools and Applications | 2021年 / 80卷

关键词：

Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).

引用

页码：15563 / 15587

页数：24

共 50 条

[41] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
Zheng, Li
Li, Qiao
Ban, Hua
Liu, Shuhua
PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
[42] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
Darch, Jonathan
Milner, Ben
Vaseghi, Saeed
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (06): : 3989 - 4000
[43] One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition
Jokic, Ivan D.
Jokic, Stevan D.
Delic, Vlado D.
Peric, Zoran H.
INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (02): : 224 - 236
[44] Hiligaynon Language 5-Word Vocabulary Speech Recognition Using Mel Frequency Cepstrum Coefficients and Genetic Algorithm
Billones, Robert Kerwin C.
Dadios, Elmer P.
2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,
[45] Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition
Ezeiza, Aitzol
Lopez de Ipina, Karmele
Hernandez, Carmen
Barroso, Nora
ADVANCES IN NONLINEAR SPEECH PROCESSING, 2011, 7015 : 183 - +
[46] Speaker Verification Using Mel-Frequency Cepstrum Coefficient And Linear Prediction coding
Agrawal, Shubhangi
Mishra, D. K.
2017 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN SIGNAL PROCESSING AND EMBEDDED SYSTEMS (RISE), 2017, : 543 - 548
[47] Recognition of Carbon Content of Pr-Nd Alloys Based on Mel-Frequency Cepstrum Coefficient of Force Signals
Xinyu Chen
Xinyu Wu
Feifei Liu
Zixian Liu
Bohua Zeng
Xiangfei Dou
JOM, 2022, 74 : 3454 - 3465
[48] Neural FET Small-Signal Modelling Based on Mel-Frequency Cepstral Coefficients
Elsharkawy, Rania R.
El-Rabaie, Sayed
Hindy, Moataza
Ghoname, Reda S.
Dessouky, Moawad I.
2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES 2009), 2009, : 321 - +
[49] Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction
Shao, X
Milner, B
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02): : 1134 - 1143
[50] Recognition of normal-abnormal phonocardiographic signals using deep convolutional neural networks and mel-frequency spectral coefficients
Maknickas, Vykintas
Maknickas, Algirdas
PHYSIOLOGICAL MEASUREMENT, 2017, 38 (08) : 1671 - 1684

← 1 2 3 4 5 →