Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引:1
|
作者
Manju D. Pawar
Rajendra D. Kokate
机构
[1] Maharashtra Institute of Technology,
[2] Government College of Engineering,undefined
来源
关键词
Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch;
D O I
暂无
中图分类号
学科分类号
摘要
A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).
引用
收藏
页码:15563 / 15587
页数:24
相关论文
共 50 条
  • [41] Speech Emotion Recognition Based on Convolution Neural Network combined with Random Forest
    Zheng, Li
    Li, Qiao
    Ban, Hua
    Liu, Shuhua
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4143 - 4147
  • [42] Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
    Darch, Jonathan
    Milner, Ben
    Vaseghi, Saeed
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (06): : 3989 - 4000
  • [43] One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition
    Jokic, Ivan D.
    Jokic, Stevan D.
    Delic, Vlado D.
    Peric, Zoran H.
    INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (02): : 224 - 236
  • [44] Hiligaynon Language 5-Word Vocabulary Speech Recognition Using Mel Frequency Cepstrum Coefficients and Genetic Algorithm
    Billones, Robert Kerwin C.
    Dadios, Elmer P.
    2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,
  • [45] Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition
    Ezeiza, Aitzol
    Lopez de Ipina, Karmele
    Hernandez, Carmen
    Barroso, Nora
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2011, 7015 : 183 - +
  • [46] Speaker Verification Using Mel-Frequency Cepstrum Coefficient And Linear Prediction coding
    Agrawal, Shubhangi
    Mishra, D. K.
    2017 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN SIGNAL PROCESSING AND EMBEDDED SYSTEMS (RISE), 2017, : 543 - 548
  • [47] Recognition of Carbon Content of Pr-Nd Alloys Based on Mel-Frequency Cepstrum Coefficient of Force Signals
    Xinyu Chen
    Xinyu Wu
    Feifei Liu
    Zixian Liu
    Bohua Zeng
    Xiangfei Dou
    JOM, 2022, 74 : 3454 - 3465
  • [48] Neural FET Small-Signal Modelling Based on Mel-Frequency Cepstral Coefficients
    Elsharkawy, Rania R.
    El-Rabaie, Sayed
    Hindy, Moataza
    Ghoname, Reda S.
    Dessouky, Moawad I.
    2009 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES 2009), 2009, : 321 - +
  • [49] Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction
    Shao, X
    Milner, B
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02): : 1134 - 1143
  • [50] Recognition of normal-abnormal phonocardiographic signals using deep convolutional neural networks and mel-frequency spectral coefficients
    Maknickas, Vykintas
    Maknickas, Algirdas
    PHYSIOLOGICAL MEASUREMENT, 2017, 38 (08) : 1671 - 1684