Recognition of emotions in speech using deep CNN and RESNET

被引:4
|
作者
Lakshmi, Kanchi Lohitha [1 ]
Muthulakshmi, P. [2 ]
Nithya, A. Alice [3 ]
Jeyavathana, R. Beaulah [3 ]
Usharani, R. [4 ]
Das, Nishi S. [5 ]
Devi, G. Naga Rama [6 ]
机构
[1] Vasireddy Venkatadri Inst Technol, Dept Comp Sci & Engn, Guntur, AP, India
[2] SRM Inst Sci & Technol, Comp Sci, Chennai, TN, India
[3] SRM Inst Sci & Technol, Dept Computat Intelligence, Chennai, TN, India
[4] SRM Inst Sci & Technol, Dept Comp Sci & Engn, Chennai, TN, India
[5] Baselios Mathew II Coll Engn, Dept EEE, Kollam, Kerala, India
[6] CMR Coll Engn & Technol, Dept Comp Sci & Engn, Hyderabad, Telangana, India
关键词
CNN; Emotion; MFCC; Speech recognition; FEATURES;
D O I
10.1007/s00500-023-07969-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The acts we engage in that transmit our emotional state or attitude to other people are referred to as emotional expressions. Communication, both verbal and nonverbal, is how they manifest themselves in the world. One of the most difficult problems to solve in data science is the problem of voice emotion recognition, often known as categorisation. In this study, we used two independent datasets, referred to as RAVDESS and TESS, each including seven distinct feelings, including neutral, happy, sad, angry, afraid, disgusted, and startled. In the raw audio wave, noise, stretching, shifting, and pitching have been used to perform the preprocessing and data augmentation that have been conducted. The characteristics like MFCC, MFC, and Chroma are taken out of the image. CNN and RESNET are the names of the two models that have been put up as possibilities. In order to achieve a higher level of accuracy in our classifications, we make use of an incremental approach to adjust the pre-trained model. In contrast to some earlier methods, none of the presented models require the data to be converted into a visual representation in order to function. Instead, they all work directly with the raw sound data. According to the findings of our experiments, our best-performing model performs better than any of the existing frameworks for TESS, thereby establishing a new standard for the industry.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
    Dhakal, Manish
    Chhetri, Arman
    Gupta, Aman Kumar
    Lamichhane, Prabin
    Pandey, Suraj
    Shakya, Subarna
    2022 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES, ICICT 2022, 2022, : 515 - 521
  • [2] Emotional speech Recognition using CNN and Deep learning techniques
    Hema, C.
    Marquez, Fausto Pedro Garcia
    APPLIED ACOUSTICS, 2023, 211
  • [3] Speech Emotion Recognition Using CNN
    Huang, Zhengwei
    Dong, Ming
    Mao, Qirong
    Zhan, Yongzhao
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 801 - 804
  • [4] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
  • [5] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [6] Speech Recognition Using HMM-CNN
    Santos, Lyndaines
    Moreira, Nicolas de Araujo
    Sampaio, Robson
    Lima, Raizielle
    Mattos Brito Oliveira, Francisco Carlos
    INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, WORLDCIST 2023, 2024, 799 : 528 - 537
  • [7] Deep Active Learning for Pornography Recognition Using ResNet
    Hor, Sui Lyn
    AlDahoul, Nouar
    Karim, Hezerul Abdul
    Lye, Mohd Haris
    Mansor, Sarina
    Fauzi, Mohammad Faizal Ahmad
    Wazir, Abdulaziz Saleh Ba
    INTERNATIONAL JOURNAL OF TECHNOLOGY, 2022, 13 (06) : 1261 - 1270
  • [8] A Hybrid of Deep CNN and Bidirectional LSTM for Automatic Speech Recognition
    Passricha, Vishal
    Aggarwal, Rajesh Kumar
    JOURNAL OF INTELLIGENT SYSTEMS, 2020, 29 (01) : 1261 - 1274
  • [9] An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning
    Youddha Beer Singh
    Shivani Goel
    Multimedia Tools and Applications, 2021, 80 : 14001 - 14018
  • [10] An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning
    Singh, Youddha Beer
    Goel, Shivani
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (09) : 14001 - 14018