Recognition of emotions in speech using deep CNN and RESNET

被引:4
|
作者
Lakshmi, Kanchi Lohitha [1 ]
Muthulakshmi, P. [2 ]
Nithya, A. Alice [3 ]
Jeyavathana, R. Beaulah [3 ]
Usharani, R. [4 ]
Das, Nishi S. [5 ]
Devi, G. Naga Rama [6 ]
机构
[1] Vasireddy Venkatadri Inst Technol, Dept Comp Sci & Engn, Guntur, AP, India
[2] SRM Inst Sci & Technol, Comp Sci, Chennai, TN, India
[3] SRM Inst Sci & Technol, Dept Computat Intelligence, Chennai, TN, India
[4] SRM Inst Sci & Technol, Dept Comp Sci & Engn, Chennai, TN, India
[5] Baselios Mathew II Coll Engn, Dept EEE, Kollam, Kerala, India
[6] CMR Coll Engn & Technol, Dept Comp Sci & Engn, Hyderabad, Telangana, India
关键词
CNN; Emotion; MFCC; Speech recognition; FEATURES;
D O I
10.1007/s00500-023-07969-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The acts we engage in that transmit our emotional state or attitude to other people are referred to as emotional expressions. Communication, both verbal and nonverbal, is how they manifest themselves in the world. One of the most difficult problems to solve in data science is the problem of voice emotion recognition, often known as categorisation. In this study, we used two independent datasets, referred to as RAVDESS and TESS, each including seven distinct feelings, including neutral, happy, sad, angry, afraid, disgusted, and startled. In the raw audio wave, noise, stretching, shifting, and pitching have been used to perform the preprocessing and data augmentation that have been conducted. The characteristics like MFCC, MFC, and Chroma are taken out of the image. CNN and RESNET are the names of the two models that have been put up as possibilities. In order to achieve a higher level of accuracy in our classifications, we make use of an incremental approach to adjust the pre-trained model. In contrast to some earlier methods, none of the presented models require the data to be converted into a visual representation in order to function. Instead, they all work directly with the raw sound data. According to the findings of our experiments, our best-performing model performs better than any of the existing frameworks for TESS, thereby establishing a new standard for the industry.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] A Deep CNN System for Classification of Emotions Using EEG Signals
    Heaton, Jacqueline
    Givigi, Sidney
    SYSCON 2022: THE 16TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON), 2022,
  • [12] Speech Recognition using Deep Learning
    Lakkhanawannakun, Phoemporn
    Noyunsan, Chaluemwut
    2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
  • [13] MIST: Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis
    Boitel, Enguerrand
    Mohasseb, Alaa
    Haig, Ella
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [14] Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features
    Anvarjon, Tursunov
    Mustaqeem
    Kwon, Soonil
    SENSORS, 2020, 20 (18) : 1 - 16
  • [15] Deep Neural Network for visual Emotion Recognition based on ResNet50 using Song-Speech characteristics
    Ayadi, Souha
    Lachiri, Zied
    PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 363 - 368
  • [16] Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks
    Sultana, Sadia
    Iqbal, M. Zafar
    Selim, M. Reza
    Rashid, Md. Mijanur
    Rahman, M. Shahidur
    IEEE ACCESS, 2022, 10 : 564 - 578
  • [17] Indonesian Continuous Speech Recognition Using CNN and Bidirectional LSTM
    Naiborhu, Anwar Petrus F.
    Endah, Sukmawati Nur
    2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,
  • [18] Recognition of Emotions from Speech using Excitation Source Features
    Koolagudi, Shashidhar G.
    Devliyal, Swati
    Chawla, Bhavna
    Barthwal, Anurag
    Rao, K. Sreenivasa
    INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 3409 - 3417
  • [19] Audio-visual speech recognition using lstm and cnn
    El Maghraby E.E.
    Gody A.M.
    Farouk M.H.
    Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
  • [20] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,