Spatiotemporal and frequential cascaded attention networks for speech emotion recognition

被引:0
|
作者
Li, Shuzhen [1 ]
Xing, Xiaofen [2 ]
Fan, Weiquan [2 ]
Cai, Bolun [3 ]
Fordson, Perry [2 ]
Xu, Xiangmin [2 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
[2] School of Electronic and Information Engineering & UBTECH-SCUT Joint Research Lab, South China University of Technology, Guangzhou, China
[3] Tencent WeChat AI, Guangzhou, China
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Speech emotion recognition is an important but difficult task in human–computer interaction systems. One of the main challenges in speech emotion recognition is how to extract effective emotion features from a long utterance. To address this issue, we propose a novel spatiotemporal and frequential cascaded attention network with large-margin learning in this paper. Spatiotemporal attention selectively locates the targeted emotional regions from a long speech spectrogram. In these targeted regions, frequential attention captures the emotional features by frequency distribution. The cascaded attention assists the neural network to gradually extract effective emotion features from the long spectrogram. During training, large-margin learning is applied to improve intra-class compactness and enlarge inter-class distances. Experiments on four public datasets demonstrate that our proposed model achieves a promising performance in speech emotion recognition. © 2021
引用
收藏
页码:238 / 248
相关论文
共 50 条
  • [21] Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition
    Li, Chao
    Jiao, Jinlong
    Zhao, Yiqin
    Zhao, Ziping
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 105 - 109
  • [22] Speech Emotion Recognition Using Convolutional-Recurrent Neural Networks with Attention Model
    Mu, Yawei
    Gomez, Hernandez
    Cano Montes, Antonio
    Alcaraz Martinez, Carlos
    Wang, Xuetian
    Gao, Hongmin
    2ND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY, CII 2017, 2017, : 341 - 350
  • [23] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
    Liu, Yang
    Xia, Yuqi
    Sun, Haoqin
    Meng, Xiaolei
    Bai, Jianxiong
    Guan, Wenbo
    Zhao, Zhen
    LI, Yongwei
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
  • [24] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [25] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [26] Nesting spatiotemporal attention networks for action recognition
    Li, Jiapeng
    Wei, Ping
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 459 : 338 - 348
  • [27] EEG Emotion Recognition Network Based on Attention and Spatiotemporal Convolution
    Zhu, Xiaoliang
    Liu, Chen
    Zhao, Liang
    Wang, Shengming
    SENSORS, 2024, 24 (11)
  • [28] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    AFFECTIVE MINDS, 2000, : 215 - 220
  • [29] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    NEURAL COMPUTING & APPLICATIONS, 2000, 9 (04): : 290 - 296
  • [30] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    Neural Computing & Applications, 2000, 9 : 290 - 296