Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

被引：2

作者：

Trinh, Dang-Linh ^{[1
]}

Vo, Minh-Cong ^{[1
]}

Kim, Soo-Hyung ^{[1
]}

Yang, Hyung-Jeong ^{[1
]}

Lee, Guee-Sang ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, 77 Yongbong Ro, Gwangju 500757, South Korea

来源：

SENSORS | 2023年 / 23卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

vocal burst; self-supervised model; self-relation attention; temporal awareness; SPEECH; VOICE;

D O I：

10.3390/s23010200

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.

引用

页数：13

共 50 条

[31] Hybrid Network Using Dynamic Graph Convolution and Temporal Self-Attention for EEG-Based Emotion Recognition
Dalian University of Technology, Department of Computer Science and Technology, Dalian
116024, China
不详
313000, China
IEEE Trans. Neural Networks Learn. Sys., 2162, 12 (18565-18575):
[32] Self-Attention GAN for EEG Data Augmentation and Emotion Recognition
Chen, Jingxia
Tang, Zhezhe
Lin, Wentao
Hu, Kailei
Xie, Jia
Computer Engineering and Applications, 2024, 59 (05) : 160 - 168
[33] Region Adaptive Self-Attention for an Accurate Facial Emotion Recognition
Lee, Seongmin
Lee, Jeonghaeng
Kim, Minsik
Lee, Sanghoon
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 791 - 796
[34] Emotion Recognition via Multiscale Feature Fusion Network and Attention Mechanism
Jiang, Yiye
Xie, Songyun
Xie, Xinzhou
Cui, Yujie
Tang, Hao
IEEE SENSORS JOURNAL, 2023, 23 (10) : 10790 - 10800
[35] Speech Emotion Recognition via Multi-Level Attention Network
Liu, Ke
Wang, Dekui
Wu, Dongya
Liu, Yutao
Feng, Jun
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
[36] BAT: Block and token self-attention for speech emotion recognition
Lei, Jianjun
Zhu, Xiangwei
Wang, Ying
Neural Networks, 2022, 156 : 67 - 80
[37] Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition\
Rigoulot, Simon
Wassiliwizky, Eugen
Pell, Marc D.
FRONTIERS IN PSYCHOLOGY, 2013, 4
[38] IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
Rajan, Vandana
Brutti, Alessio
Cavallaro, Andrea
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4693 - 4697
[39] Happy Emotion Recognition in Videos Via Apex Spotting and Temporal Models
Samadiani, Najmeh
Huang, Guangyan
2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 514 - 519
[40] Robot Self-Awareness: Temporal Relation Based Data Mining
Gorbenko, Anna
Popov, Vladimir
Sheka, Andrey
ENGINEERING LETTERS, 2011, 19 (03) : 169 - 178

← 1 2 3 4 5 →