Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

被引:2
|
作者
Trinh, Dang-Linh [1 ]
Vo, Minh-Cong [1 ]
Kim, Soo-Hyung [1 ]
Yang, Hyung-Jeong [1 ]
Lee, Guee-Sang [1 ]
机构
[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, 77 Yongbong Ro, Gwangju 500757, South Korea
基金
新加坡国家研究基金会;
关键词
vocal burst; self-supervised model; self-relation attention; temporal awareness; SPEECH; VOICE;
D O I
10.3390/s23010200
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Hybrid Network Using Dynamic Graph Convolution and Temporal Self-Attention for EEG-Based Emotion Recognition
    Dalian University of Technology, Department of Computer Science and Technology, Dalian
    116024, China
    不详
    313000, China
    IEEE Trans. Neural Networks Learn. Sys., 2162, 12 (18565-18575):
  • [32] Self-Attention GAN for EEG Data Augmentation and Emotion Recognition
    Chen, Jingxia
    Tang, Zhezhe
    Lin, Wentao
    Hu, Kailei
    Xie, Jia
    Computer Engineering and Applications, 2024, 59 (05) : 160 - 168
  • [33] Region Adaptive Self-Attention for an Accurate Facial Emotion Recognition
    Lee, Seongmin
    Lee, Jeonghaeng
    Kim, Minsik
    Lee, Sanghoon
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 791 - 796
  • [34] Emotion Recognition via Multiscale Feature Fusion Network and Attention Mechanism
    Jiang, Yiye
    Xie, Songyun
    Xie, Xinzhou
    Cui, Yujie
    Tang, Hao
    IEEE SENSORS JOURNAL, 2023, 23 (10) : 10790 - 10800
  • [35] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [36] BAT: Block and token self-attention for speech emotion recognition
    Lei, Jianjun
    Zhu, Xiangwei
    Wang, Ying
    Neural Networks, 2022, 156 : 67 - 80
  • [37] Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition\
    Rigoulot, Simon
    Wassiliwizky, Eugen
    Pell, Marc D.
    FRONTIERS IN PSYCHOLOGY, 2013, 4
  • [38] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
    Rajan, Vandana
    Brutti, Alessio
    Cavallaro, Andrea
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
  • [39] Happy Emotion Recognition in Videos Via Apex Spotting and Temporal Models
    Samadiani, Najmeh
    Huang, Guangyan
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 514 - 519
  • [40] Robot Self-Awareness: Temporal Relation Based Data Mining
    Gorbenko, Anna
    Popov, Vladimir
    Sheka, Andrey
    ENGINEERING LETTERS, 2011, 19 (03) : 169 - 178