Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

被引：2

作者：

Trinh, Dang-Linh ^{[1
]}

Vo, Minh-Cong ^{[1
]}

Kim, Soo-Hyung ^{[1
]}

Yang, Hyung-Jeong ^{[1
]}

Lee, Guee-Sang ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, 77 Yongbong Ro, Gwangju 500757, South Korea

来源：

SENSORS | 2023年 / 23卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

vocal burst; self-supervised model; self-relation attention; temporal awareness; SPEECH; VOICE;

D O I：

10.3390/s23010200

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.

引用

页数：13

共 50 条

[41] DSTCNet: Deep Spectro-Temporal-Channel Attention Network for Speech Emotion Recognition
Guo, Lili
Ding, Shifei
Wang, Longbiao
Dang, Jianwu
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 188 - 197
[42] Attention-based Spatio-Temporal Graphic LSTM for EEG Emotion Recognition
Li, Xiaoxu
Zheng, Wenming
Zong, Yuan
Chang, Hongli
Lu, Cheng
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[43] Emotion embedding framework with emotional self-attention mechanism for speaker recognition
Li, Dongdong
Yang, Zhuo
Liu, Jinlin
Yang, Hai
Wang, Zhe
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[44] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
Sun, Licai
Liu, Bin
Tao, Jianhua
Lian, Zheng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
[45] Attention to Emotions: Body Emotion Recognition In-the-Wild Using Self-attention Transformer Network
Paiva, Pedro V. V.
Ramos, Josue J. G.
Gavrilova, Marina
Carvalho, Marco A. G.
COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2024, 2103 : 206 - 228
[46] SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
Salazar, Julian
Kirchhoff, Katrin
Huang, Zhiheng
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7115 - 7119
[47] A HETEROGENEOUS FACE RECOGNITION VIA PART ADAPTIVE AND RELATION ATTENTION MODULE
Xu, Rushuang
Cho, MyeongAh
Lee, Sangyoun
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2983 - 2987
[48] Multi-View Speech Emotion Recognition Via Collective Relation Construction
Hou, Mixiao
Zhang, Zheng
Cao, Qi
Zhang, David
Lu, Guangming
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
[49] HASTF: a hybrid attention spatio-temporal feature fusion network for EEG emotion recognition
Hu, Fangzhou
Wang, Fei
Bi, Jinying
An, Zida
Chen, Chao
Qu, Gangguo
Han, Shuai
FRONTIERS IN NEUROSCIENCE, 2024, 18
[50] A Dual Attention Spatial-Temporal Graph Convolutional Network for Emotion Recognition from Gait
Liu, Jiaqing
Kisita, Shoji
Chai, Shurong
Tateyama, Tomoko
Iwamoto, Yutaro
Chen, Yen-Wei
Journal of the Institute of Image Electronics Engineers of Japan, 2022, 51 (04): : 309 - 317

← 1 2 3 4 5 →