Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients

被引:1
|
作者
Manju D. Pawar
Rajendra D. Kokate
机构
[1] Maharashtra Institute of Technology,
[2] Government College of Engineering,undefined
来源
关键词
Convolution neural network; Feature extraction; Speech emotion recognition; Energy; Pitch;
D O I
暂无
中图分类号
学科分类号
摘要
A significant role is played by Speech Emotion Recognition (SER) with different applications in affective computing and human-computer interface. In literature, the most adapted technique for recognition of emotion was based on simple feature extraction using a simple classifier. Most of the methods in the literature has limited efficiency for the recognition of emotion. Hence for solving these drawbacks, five various models based on Convolution Neural Network (CNN) was proposed in this paper for recognition of emotion through signals obtained on speech. In the methodology which was proposed, seven different emotions are recognised with the utilisation of CNN with feature extraction methods includes disgust, normal, fear Joy, Anger, Sadness and surprise. Initially, the speech emotion signals are collected from the database such as berlin database. After that, feature extraction is considered, and it is carried out by the Pitch and Energy, Mel-Frequency Cepstral Coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC). The mentioned feature extraction process is widely used for classifying the speech data and perform better in performance. Mel-cepstral coefficients utilise less time for shaping the spectral with adequate data and offers better voice quality. The extracted features are used for the recognition purpose by CNN network. In the proposed CNN network, either one or more pairs of convolutions, besides, max-pooling layers remain present. With the utilisation of the CNN network, the emotions are recognised through the input speech signal. The proposed method is implemented in MATLAB, and it will be contrasted with the existing method such as Linear Prediction Cepstral Coefficient (LPCC) with the K-Nearest Neighbour (KNN) classifier to test the samples for optimal performance evaluation. The Statistical measurements are utilised for analysing the performance such as accuracy, precision, specificity, recall, sensitivity, error rate, receiver operating characteristics (ROC) curve, an area under curve (AUC), and False Positive Rate (FPR).
引用
收藏
页码:15563 / 15587
页数:24
相关论文
共 50 条
  • [21] Fingerprint Recognition Using Mel-Frequency Cepstral Coefficients
    Hashad F.G.
    Halim T.M.
    Diab S.M.
    Sallam B.M.
    El-Samie F.E.A.
    Pattern Recognition and Image Analysis, 2010, 20 (03) : 360 - 369
  • [22] Gender based Voice Authentication Using Gaussian Mixture Model and Mel-Frequency Cepstrum Coefficients
    Rajeh, Wahid
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (01): : 539 - 545
  • [23] Modelling and Characterization of an Artificial Neural Network for Infant Cry Recognition Using Mel-Frequency Cepstral Coefficients
    Bandala, Argel A.
    Lim, Allimzon M.
    Cai, Mark Anthony D.
    Bacar, Allan Jeffrey C.
    Manosca, Aynna Claudine G.
    TENCON 2014 - 2014 IEEE REGION 10 CONFERENCE, 2014,
  • [24] Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning
    Ayvaz, Ugur
    Guruler, Huseyin
    Khan, Faheem
    Ahmed, Naveed
    Whangbo, Taegkeun
    Bobomirzaevich, Abdusalomov Akmalbek
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (03): : 5511 - 5521
  • [25] Automatic Speaker Recognition Based on Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models
    Memon, Sheeraz
    Bhatti, Sania
    Abro, Farzana Rauf
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2013, 32 (04) : 543 - 550
  • [26] Filled Pause Classification Using Energy-Boosted Mel-Frequency Cepstrum Coefficients
    Hamzah, Raseeda
    Jamil, Nursuriati
    Seman, Noraini
    8TH INTERNATIONAL CONFERENCE ON ROBOTIC, VISION, SIGNAL PROCESSING & POWER APPLICATIONS: INNOVATION EXCELLENCE TOWARDS HUMANISTIC TECHNOLOGY, 2014, 291 : 311 - 319
  • [27] Voice Recognition and Marking Using Mel-frequency Cepstral Coefficients
    Sheu, Jia-Shing
    Chen, Ching-Wen
    SENSORS AND MATERIALS, 2020, 32 (10) : 3209 - 3220
  • [28] The Implementation of Speech Recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machine (SVM) method based on Python']Python to Control Robot Arm
    Anggraeni, D.
    Sanjaya, W. S. M.
    Nurasyidiek, M. Y. S.
    Munawwaroh, M.
    2ND ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE (AASEC 2017), 2018, 288
  • [29] Iris feature extraction through wavelet mel-frequency cepstrum coefficients
    Barpanda, Soubhagya Sankar
    Majhi, Banshidhar
    Sa, Panjak Kumar
    Sangaiah, Arun Kumar
    Bakshi, Sambit
    OPTICS AND LASER TECHNOLOGY, 2019, 110 : 13 - 23
  • [30] Emotion Recognition using Lyapunov Exponent of the Mel-Frequency Energy Bands
    Feraru, Monica
    Zbancioc, Marius
    PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2014,