A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

被引:0
|
作者
Youddha Beer Singh
Shivani Goel
机构
[1] Bennett University,School of Computer Science Engineering and Technology
[2] KIET Group of Institutions,Department of Computer Science and Information Technology
[3] Delhi-NCR,undefined
来源
关键词
Convolutional Neural Network; Indian Emotional Speech Corpora; Spectrogram; Speech Emotion Recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Speech Emotion Recognition (SER) is the process of recognizing emotions by extracting few features of speech signals. It is becoming very popular in Human Computer Interaction (HCI) applications. The challenge is to extract relevant features of speech to recognize emotions with a low computational cost. In this paper, a lightweight Convolutional Neural Network (LCNN) based model has been proposed which extracts useful features automatically. The speech samples are converted into spectrograms of size 224 × 224 for LCNN input. 5 CNN layers and stride are used for down-sampling the feature maps in place of pooling layers which reduces the computational cost. It has been evaluated for accuracy on publicly available benchmark datasets EMOVO (81%), EMODB (87%), and SAVEE (80%). The accuracy of proposed model is also found to be better than SER CNN-assisted model, ResNet-18 and ResNet-34 models. Very few speech datasets are available in Indian ascent. So, authors have created a new Indian Emotional Speech Corpora (IESC) in English language with 600 speech samples recorded from 8 speakers using 2 sentences in 5 emotions. It will be made publicly available for researchers. The accuracy of the proposed LCNN model on IESC is found to be 95% which is better than existing datasets.
引用
收藏
页码:23055 / 23073
页数:18
相关论文
共 50 条
  • [41] Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network
    Kim, Myungjong
    Cao, Beiming
    Mau, Ted
    Wang, Jun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2323 - 2336
  • [42] A Study of Cross-Linguistic Speech Emotion Recognition Based on 2D Feature Spaces
    Tamulevicius, Gintautas
    Korvel, Grazina
    Yayak, Anil Bora
    Treigys, Povilas
    Bernataviciene, Jolita
    Kostek, Bozna
    [J]. ELECTRONICS, 2020, 9 (10) : 1 - 13
  • [43] BERIS: An mBERT-based Emotion Recognition Algorithm from Indian Speech
    Mehra, Pramod
    Verma, Shashi Kant
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [44] Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition
    Zhao, Ziping
    Li, Qifei
    Zhang, Zixing
    Cummins, Nicholas
    Wang, Haishuai
    Tao, Jianhua
    Schuller, Bjoern W.
    [J]. NEURAL NETWORKS, 2021, 141 : 52 - 60
  • [45] Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features
    Anvarjon, Tursunov
    Mustaqeem
    Kwon, Soonil
    [J]. SENSORS, 2020, 20 (18) : 1 - 16
  • [46] Novel 1D and 2D Convolutional Neural Networks for Facial and Speech Emotion Recognition
    Bodavarapu, Pavan Nageswar Reddy
    Reddy, B. Gowtham Kumar
    Srinivas, P. V. V. S.
    [J]. THIRD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND CAPSULE NETWORKS (ICIPCN 2022), 2022, 514 : 374 - 384
  • [47] Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition
    Atila, Orhan
    Sengur, Abdulkadir
    [J]. APPLIED ACOUSTICS, 2021, 182
  • [48] Customized 2D CNN Model for the Automatic Emotion Recognition Based on EEG Signals
    Baradaran, Farzad
    Farzan, Ali
    Danishvar, Sebelan
    Sheykhivand, Sobhan
    [J]. ELECTRONICS, 2023, 12 (10)
  • [49] EEG-Based Emotion Recognition Using a 2D CNN with Different Kernels
    Wang, Yuqi
    Zhang, Lijun
    Xia, Pan
    Wang, Peng
    Chen, Xianxiang
    Du, Lidong
    Fang, Zhen
    Du, Mingyan
    [J]. BIOENGINEERING-BASEL, 2022, 9 (06):
  • [50] MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach
    Mustaqeem
    Kwon, Soonil
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167