Recognition of emotions in speech using deep CNN and RESNET

被引：4

作者：

Lakshmi, Kanchi Lohitha ^{[1
]}

Muthulakshmi, P. ^{[2
]}

Nithya, A. Alice ^{[3
]}

Jeyavathana, R. Beaulah ^{[3
]}

Usharani, R. ^{[4
]}

Das, Nishi S. ^{[5
]}

Devi, G. Naga Rama ^{[6
]}

机构：

[1] Vasireddy Venkatadri Inst Technol, Dept Comp Sci & Engn, Guntur, AP, India

[2] SRM Inst Sci & Technol, Comp Sci, Chennai, TN, India

[3] SRM Inst Sci & Technol, Dept Computat Intelligence, Chennai, TN, India

[4] SRM Inst Sci & Technol, Dept Comp Sci & Engn, Chennai, TN, India

[5] Baselios Mathew II Coll Engn, Dept EEE, Kollam, Kerala, India

[6] CMR Coll Engn & Technol, Dept Comp Sci & Engn, Hyderabad, Telangana, India

来源：

SOFT COMPUTING | 2023年

关键词：

CNN; Emotion; MFCC; Speech recognition; FEATURES;

D O I：

10.1007/s00500-023-07969-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The acts we engage in that transmit our emotional state or attitude to other people are referred to as emotional expressions. Communication, both verbal and nonverbal, is how they manifest themselves in the world. One of the most difficult problems to solve in data science is the problem of voice emotion recognition, often known as categorisation. In this study, we used two independent datasets, referred to as RAVDESS and TESS, each including seven distinct feelings, including neutral, happy, sad, angry, afraid, disgusted, and startled. In the raw audio wave, noise, stretching, shifting, and pitching have been used to perform the preprocessing and data augmentation that have been conducted. The characteristics like MFCC, MFC, and Chroma are taken out of the image. CNN and RESNET are the names of the two models that have been put up as possibilities. In order to achieve a higher level of accuracy in our classifications, we make use of an incremental approach to adjust the pre-trained model. In contrast to some earlier methods, none of the presented models require the data to be converted into a visual representation in order to function. Instead, they all work directly with the raw sound data. According to the findings of our experiments, our best-performing model performs better than any of the existing frameworks for TESS, thereby establishing a new standard for the industry.

引用

页数：17

共 50 条

[11] A Deep CNN System for Classification of Emotions Using EEG Signals
Heaton, Jacqueline
Givigi, Sidney
SYSCON 2022: THE 16TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON), 2022,
[12] Speech Recognition using Deep Learning
Lakkhanawannakun, Phoemporn
Noyunsan, Chaluemwut
2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
[13] MIST: Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis
Boitel, Enguerrand
Mohasseb, Alaa
Haig, Ella
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
[14] Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features
Anvarjon, Tursunov
Mustaqeem
Kwon, Soonil
SENSORS, 2020, 20 (18) : 1 - 16
[15] Deep Neural Network for visual Emotion Recognition based on ResNet50 using Song-Speech characteristics
Ayadi, Souha
Lachiri, Zied
PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 363 - 368
[16] Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks
Sultana, Sadia
Iqbal, M. Zafar
Selim, M. Reza
Rashid, Md. Mijanur
Rahman, M. Shahidur
IEEE ACCESS, 2022, 10 : 564 - 578
[17] Indonesian Continuous Speech Recognition Using CNN and Bidirectional LSTM
Naiborhu, Anwar Petrus F.
Endah, Sukmawati Nur
2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,
[18] Recognition of Emotions from Speech using Excitation Source Features
Koolagudi, Shashidhar G.
Devliyal, Swati
Chawla, Bhavna
Barthwal, Anurag
Rao, K. Sreenivasa
INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 3409 - 3417
[19] Audio-visual speech recognition using lstm and cnn
El Maghraby E.E.
Gody A.M.
Farouk M.H.
Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
[20] Learning Salient Features for Speech Emotion Recognition Using CNN
Liu, Jiamu
Han, Wenjing
Ruan, Huabin
Chen, Xiaomin
Jiang, Dongmei
Li, Haifeng
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,

← 1 2 3 4 5 →