Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

被引:2
|
作者
Trinh, Dang-Linh [1 ]
Vo, Minh-Cong [1 ]
Kim, Soo-Hyung [1 ]
Yang, Hyung-Jeong [1 ]
Lee, Guee-Sang [1 ]
机构
[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, 77 Yongbong Ro, Gwangju 500757, South Korea
基金
新加坡国家研究基金会;
关键词
vocal burst; self-supervised model; self-relation attention; temporal awareness; SPEECH; VOICE;
D O I
10.3390/s23010200
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition
    Guo, Lili
    Ding, Shifei
    Wang, Longbiao
    Dang, Jianwu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 188 - 197
  • [42] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
    Li, Xiaoxu
    Zheng, Wenming
    Zong, Yuan
    Chang, Hongli
    Lu, Cheng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [43] Emotion embedding framework with emotional self-attention mechanism for speaker recognition
    Li, Dongdong
    Yang, Zhuo
    Liu, Jinlin
    Yang, Hai
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [44] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    Lian, Zheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
  • [45] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
    Paiva, Pedro V. V.
    Ramos, Josue J. G.
    Gavrilova, Marina
    Carvalho, Marco A. G.
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
  • [46] SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
    Salazar, Julian
    Kirchhoff, Katrin
    Huang, Zhiheng
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7115 - 7119
  • [47] A HETEROGENEOUS FACE RECOGNITION VIA PART ADAPTIVE AND RELATION ATTENTION MODULE
    Xu, Rushuang
    Cho, MyeongAh
    Lee, Sangyoun
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2983 - 2987
  • [48] Multi-View Speech Emotion Recognition Via Collective Relation Construction
    Hou, Mixiao
    Zhang, Zheng
    Cao, Qi
    Zhang, David
    Lu, Guangming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
  • [49] HASTF: a hybrid attention spatio-temporal feature fusion network for EEG emotion recognition
    Hu, Fangzhou
    Wang, Fei
    Bi, Jinying
    An, Zida
    Chen, Chao
    Qu, Gangguo
    Han, Shuai
    FRONTIERS IN NEUROSCIENCE, 2024, 18
  • [50] A Dual Attention Spatial-Temporal Graph Convolutional Network for Emotion Recognition from Gait
    Liu, Jiaqing
    Kisita, Shoji
    Chai, Shurong
    Tateyama, Tomoko
    Iwamoto, Yutaro
    Chen, Yen-Wei
    Journal of the Institute of Image Electronics Engineers of Japan, 2022, 51 (04): : 309 - 317