Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

被引:2
|
作者
Trinh, Dang-Linh [1 ]
Vo, Minh-Cong [1 ]
Kim, Soo-Hyung [1 ]
Yang, Hyung-Jeong [1 ]
Lee, Guee-Sang [1 ]
机构
[1] Chonnam Natl Univ, Dept Artificial Intelligence Convergence, 77 Yongbong Ro, Gwangju 500757, South Korea
基金
新加坡国家研究基金会;
关键词
vocal burst; self-supervised model; self-relation attention; temporal awareness; SPEECH; VOICE;
D O I
10.3390/s23010200
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Speech emotion recognition (SER) is one of the most exciting topics many researchers have recently been involved in. Although much research has been conducted recently on this topic, emotion recognition via non-verbal speech (known as the vocal burst) is still sparse. The vocal burst is concise and has meaningless content, which is harder to deal with than verbal speech. Therefore, in this paper, we proposed a self-relation attention and temporal awareness (SRA-TA) module to tackle this problem with vocal bursts, which could capture the dependency in a long-term period and focus on the salient parts of the audio signal as well. Our proposed method contains three main stages. Firstly, the latent features are extracted using a self-supervised learning model from the raw audio signal and its Mel-spectrogram. After the SRA-TA module is utilized to capture the valuable information from latent features, all features are concatenated and fed into ten individual fully-connected layers to predict the scores of 10 emotions. Our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, which achieves the first ranking of the high-dimensional emotion task in the 2022 ACII Affective Vocal Burst Workshop & Challenge.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] The Relation Between Vocal Pitch and Vocal Emotion Recognition Abilities in People with Autism Spectrum Disorder and Typical Development
    Stefanie Schelinski
    Katharina von Kriegstein
    Journal of Autism and Developmental Disorders, 2019, 49 : 68 - 82
  • [22] Self-Rated Confidence in Vocal Emotion Recognition Ability: The Role of Gender
    Sinvani, Rachel-Tzofia
    Fogel-Grinvald, Haya
    Sapir, Shimon
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2024, 67 (05): : 1413 - 1423
  • [23] Music emotion recognition based on temporal convolutional attention network using EEG
    Qiao, Yinghao
    Mu, Jiajia
    Xie, Jialan
    Hu, Binghui
    Liu, Guangyuan
    FRONTIERS IN HUMAN NEUROSCIENCE, 2024, 18
  • [24] Attention-enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition
    Zhao, Ziping
    Bao, Zhongtian
    Zhang, Zixing
    Cummins, Nicholas
    Wang, Haishuai
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 206 - 210
  • [25] REPRESENTATION LEARNING WITH SPECTRO-TEMPORAL-CHANNEL ATTENTION FOR SPEECH EMOTION RECOGNITION
    Guo, Lili
    Wang, Longbiao
    Xu, Chenglin
    Dang, Jianwu
    Chng, Eng Siong
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6304 - 6308
  • [26] Hybrid Network Using Dynamic Graph Convolution and Temporal Self-Attention for EEG-Based Emotion Recognition
    Cheng, Cheng
    Yu, Zikang
    Zhang, Yong
    Feng, Lin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 11
  • [27] Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
    Sun Q.
    Wang S.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 588 - 601
  • [28] Dual Multi-Task Network with Bridge-Temporal-Attention for Student Emotion Recognition via Classroom Video
    He, Jun
    Peng, Li
    Sun, Bo
    Yu, Lejun
    Guo, Meng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] Self-attention fusion for audiovisual emotion recognition with incomplete data
    Chumachenko, Kateryna
    Iosifidis, Alexandros
    Gabbouj, Moncef
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2822 - 2828
  • [30] BAT: Block and token self-attention for speech emotion recognition
    Lei, Jianjun
    Zhu, Xiangwei
    Wang, Ying
    NEURAL NETWORKS, 2022, 156 : 67 - 80