AAD-Net: Advanced end-to-end signal processing system for human emotion detection & recognition using attention-based deep echo state network

被引:32
|
作者
Mustaqeem, Khan [1 ]
El Saddik, Abdulmotaleb [1 ]
Alotaibi, Fahd Saleh [2 ]
Pham, Nhat Truong [3 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence MBZ, Dept Comp Vis, Abu Dhabi, U Arab Emirates
[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[3] Sungkyunkwan Univ, Coll Biotechnol & Bioengn, Dept Integrat Biotechnol, Computat Biol & Bioinformat Lab, Suwon 16419, South Korea
基金
新加坡国家研究基金会;
关键词
Affective computing; Attention mechanism; Convolution neural network; Echo state networks; Emotion recognition; Human-computer interaction; Audio speech signals; FEATURES; ESN;
D O I
10.1016/j.knosys.2023.110525
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech signals are the most convenient way of communication between human beings and the eventual method of Human-Computer Interaction (HCI) to exchange emotions and information. Rec-ognizing emotions from speech signals is a challenging task due to the sparse nature of emotional data and features. In this article, we proposed a Deep Echo-State-Network (DeepESN) system for emotion recognition with a dilated convolution neural network and multi-headed attention mechanism. To reduce the model complexity, we incorporate a DeepESN that combines reservoir computing for higher-dimensional mapping. We also used fine-tuned Sparse Random Projection (SRP) to reduce dimensionality and adopted an early fusion strategy to fuse the extracted cues and passed the joint feature vector via a classification layer to recognize emotions. Our proposed model is evaluated on two public speech corpora, EMO-DB and RAVDESS, and tested for subject/speaker-dependent/independent performance. The results show that our proposed system achieves a high recognition rate, 91.14, 85.57 for EMO-DB, and 82.01, 77.02 for RAVDESS, using speaker-dependent and independent experiments, respectively. Our proposed system outperforms the State-of-The-Art (SOTA) while requiring less computational time.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 38 条
  • [1] Real-time emotion recognition using end-to-end attention-based fusion network
    Shit, Sahadeb
    Rana, Aiswarya
    Das, Dibyendu Kumar
    Ray, Dip Narayan
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [2] ADIEU FEATURES? END-TO-END SPEECH EMOTION RECOGNITION USING A DEEP CONVOLUTIONAL RECURRENT NETWORK
    Trigeorgis, George
    Ringeval, Fabien
    Brueckner, Raymond
    Marchi, Erik
    Nicolaou, Mihalis A.
    Shuller, Bjoern
    Zafeiriou, Stefanos
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5200 - 5204
  • [3] Depth-based end-to-end deep network for human action recognition
    Chaudhary, Sachin
    Murala, Subrahmanyam
    [J]. IET COMPUTER VISION, 2019, 13 (01) : 15 - 22
  • [4] End-to-End Deep Learning-Based Human Activity Recognition Using Channel State Information
    Hsieh, Chaur-Heh
    Chen, Jen-Yang
    Kuo, Chung-Ming
    Wang, Ping
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (02): : 271 - 281
  • [5] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [6] End-To-End Speech Emotion Recognition Based on Time and Frequency Information Using Deep Neural Networks
    Bakhshi, Ali
    Wong, Aaron S. W.
    Chalup, Stephan
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 969 - 975
  • [7] DeepSNP: An End-to-End Deep Neural Network with Attention-Based Localization for Breakpoint Detection in Single-Nucleotide Polymorphism Array Genomic Data
    Eghbal-Zadeh, Hamid
    Fischer, Lukas
    Popitsch, Niko
    Kromp, Florian
    Taschner-Mandl, Sabine
    Gerber, Teresa
    Bozsaky, Eva
    Ambros, Peter F.
    Ambros, Inge M.
    Widmer, Gerhard
    Moser, Bernhard A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (06) : 572 - 596
  • [8] A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition
    Mohammed, Adam Ahmed Qaid
    Lv, Jiancheng
    Islam, Md. Sajjatul
    [J]. SENSORS, 2019, 19 (23)
  • [9] qArI: A Hybrid CTC/Attention-Based Model for Quran Recitation Recognition Using Bidirectional LSTMP in an End-to-End Architecture
    Alfadhli, Sumayya
    Alharbi, Hajar
    Cherif, Asma
    [J]. IEEE ACCESS, 2024, 12 : 95762 - 95777
  • [10] EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network
    Cui, Heng
    Liu, Aiping
    Zhang, Xu
    Chen, Xiang
    Wang, Kongqiao
    Chen, Xun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 205