Recognition of emotions in speech using deep CNN and RESNET

被引：4

作者：

Lakshmi, Kanchi Lohitha ^{[1
]}

Muthulakshmi, P. ^{[2
]}

Nithya, A. Alice ^{[3
]}

Jeyavathana, R. Beaulah ^{[3
]}

Usharani, R. ^{[4
]}

Das, Nishi S. ^{[5
]}

Devi, G. Naga Rama ^{[6
]}

机构：

[1] Vasireddy Venkatadri Inst Technol, Dept Comp Sci & Engn, Guntur, AP, India

[2] SRM Inst Sci & Technol, Comp Sci, Chennai, TN, India

[3] SRM Inst Sci & Technol, Dept Computat Intelligence, Chennai, TN, India

[4] SRM Inst Sci & Technol, Dept Comp Sci & Engn, Chennai, TN, India

[5] Baselios Mathew II Coll Engn, Dept EEE, Kollam, Kerala, India

[6] CMR Coll Engn & Technol, Dept Comp Sci & Engn, Hyderabad, Telangana, India

来源：

SOFT COMPUTING | 2023年

关键词：

CNN; Emotion; MFCC; Speech recognition; FEATURES;

D O I：

10.1007/s00500-023-07969-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The acts we engage in that transmit our emotional state or attitude to other people are referred to as emotional expressions. Communication, both verbal and nonverbal, is how they manifest themselves in the world. One of the most difficult problems to solve in data science is the problem of voice emotion recognition, often known as categorisation. In this study, we used two independent datasets, referred to as RAVDESS and TESS, each including seven distinct feelings, including neutral, happy, sad, angry, afraid, disgusted, and startled. In the raw audio wave, noise, stretching, shifting, and pitching have been used to perform the preprocessing and data augmentation that have been conducted. The characteristics like MFCC, MFC, and Chroma are taken out of the image. CNN and RESNET are the names of the two models that have been put up as possibilities. In order to achieve a higher level of accuracy in our classifications, we make use of an incremental approach to adjust the pre-trained model. In contrast to some earlier methods, none of the presented models require the data to be converted into a visual representation in order to function. Instead, they all work directly with the raw sound data. According to the findings of our experiments, our best-performing model performs better than any of the existing frameworks for TESS, thereby establishing a new standard for the industry.

引用

页数：17

共 50 条

[1] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
Dhakal, Manish
Chhetri, Arman
Gupta, Aman Kumar
Lamichhane, Prabin
Pandey, Suraj
Shakya, Subarna
2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
[2] Emotional speech Recognition using CNN and Deep learning techniques
Hema, C.
Marquez, Fausto Pedro Garcia
APPLIED ACOUSTICS, 2023, 211
[3] Speech Emotion Recognition Using CNN
Huang, Zhengwei
Dong, Ming
Mao, Qirong
Zhan, Yongzhao
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
[4] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Swami Mishra
Nehal Bhatnagar
Prakasam P
Sureshkumar T. R
Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
[5] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
Mishra, Swami
Bhatnagar, Nehal
Prakasam, P.
Sureshkumar, T. R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
[6] Speech Recognition Using HMM-CNN
Santos, Lyndaines
Moreira, Nicolas de Araujo
Sampaio, Robson
Lima, Raizielle
Mattos Brito Oliveira, Francisco Carlos
INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, WORLDCIST 2023, 2024, 799 : 528 - 537
[7] Deep Active Learning for Pornography Recognition Using ResNet
Hor, Sui Lyn
AlDahoul, Nouar
Karim, Hezerul Abdul
Lye, Mohd Haris
Mansor, Sarina
Fauzi, Mohammad Faizal Ahmad
Wazir, Abdulaziz Saleh Ba
INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (06) : 1261 - 1270
[8] A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
Passricha, Vishal
Aggarwal, Rajesh Kumar
JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 1261 - 1274
[9] An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning
Youddha Beer Singh
Shivani Goel
Multimedia Tools and Applications, 2021, 80 : 14001 - 14018
[10] An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning
Singh, Youddha Beer
Goel, Shivani
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (09) : 14001 - 14018

← 1 2 3 4 5 →