Effective ensembling classification strategy for voice and emotion recognition

被引:0
|
作者
Yasser Alharbi
机构
[1] University of Hail,College of Computer Science and Engineering
关键词
VER; Recognition; CapsNet; R-LSTM; Ensemble learning; Accuracy; Emotion;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, Machine learning techniques are found to be unique among the most effective approaches for Voice and Emotion Recognition (VER). Moreover, automatic recognition of voice and emotions is essential for smooth psychosocial interactions between humans and machines. There have been huge strides in creating workable pieces of art that combine spectrogram and deep learning characteristics in the VER research. On the other hand, although single Machine Learning (ML) methods deliver acceptable results, it's not quite reaching the standards yet. This necessitates the development of strategies that use various ML techniques, target multiple aspects and elements of voice recognition. This article proposes an ensembling classifier model that incorporates the outcome of base classifiers (CapsNet and RNNs) for VER. The CapsNet model can identify the spatial correlation of vital speech information in spectrograms using a pooling technique. The RNN, on the other hand, is excellent for processing time-series datasets, and both are well known for their performance in classification work. Stacked generalization is used for constructing ensemble classifiers that integrate predictions made by CapsNet and RNN classifiers. As much as 96.05% of overall accuracy is obtained when using this ensemble approach, which is more effective than either CapsNets or RNN when individually compared. One of the significant benefits of the proposed classifier is that it effectively detects the emotional class 'FEAR', with a recognition rate of 96.68% among seven other classes.
引用
收藏
页码:334 / 345
页数:11
相关论文
共 50 条
  • [1] Effective ensembling classification strategy for voice and emotion recognition
    Alharbi, Yasser
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (01) : 334 - 345
  • [2] Speech Emotion Recognition via Ensembling Neural Networks
    Luo, Danqing
    Zou, Yuexian
    Huang, Dongyan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1351 - 1355
  • [3] FILTWAM and Voice Emotion Recognition
    Bahreini, Kiavash
    Nadolski, Rob
    Westera, Wim
    GAMES AND LEARNING ALLIANCE, 2014, 8605 : 116 - 129
  • [4] Emotion Recognition on Static Images Using Deep Transfer Learning and Ensembling
    Abanoz, Huseyin
    Cataltepe, Zehra
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [5] Emotion Recognition from the Human Voice
    Parlak, Cevahir
    Diri, Banu
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [6] Automatic Emotion Recognition and Classification
    Ezhilarasia, R.
    Minu, R. I.
    INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 21 - 26
  • [7] The recognition of emotion in the faces and voice of anorexia nervosa
    Kucharska-Pietura, K
    Nikolaou, V
    Masiak, M
    Treasure, J
    INTERNATIONAL JOURNAL OF EATING DISORDERS, 2004, 35 (01) : 42 - 47
  • [8] Voice Emotion Recognition in Real Time Applications
    Aghajani, Mahsa
    Ben Abdessalem, Hamdi
    Frasson, Claude
    INTELLIGENT TUTORING SYSTEMS (ITS 2021), 2021, 12677 : 490 - 496
  • [9] Emotion Recognition on Call Center Voice Data
    Yurtay, Yueksel
    Demirci, Huseyin
    Tiryaki, Huseyin
    Altun, Tekin
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [10] Voice Quality Features for Speech Emotion Recognition
    Idris, Inshirah
    Salam, Md Sah Hj
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2015, 10 (04): : 183 - 191