Spatiotemporal and frequential cascaded attention networks for speech emotion recognition

被引:0
|
作者
Li, Shuzhen [1 ]
Xing, Xiaofen [2 ]
Fan, Weiquan [2 ]
Cai, Bolun [3 ]
Fordson, Perry [2 ]
Xu, Xiangmin [2 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
[2] School of Electronic and Information Engineering & UBTECH-SCUT Joint Research Lab, South China University of Technology, Guangzhou, China
[3] Tencent WeChat AI, Guangzhou, China
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Speech emotion recognition is an important but difficult task in human–computer interaction systems. One of the main challenges in speech emotion recognition is how to extract effective emotion features from a long utterance. To address this issue, we propose a novel spatiotemporal and frequential cascaded attention network with large-margin learning in this paper. Spatiotemporal attention selectively locates the targeted emotional regions from a long speech spectrogram. In these targeted regions, frequential attention captures the emotional features by frequency distribution. The cascaded attention assists the neural network to gradually extract effective emotion features from the long spectrogram. During training, large-margin learning is applied to improve intra-class compactness and enlarge inter-class distances. Experiments on four public datasets demonstrate that our proposed model achieves a promising performance in speech emotion recognition. © 2021
引用
收藏
页码:238 / 248
相关论文
共 50 条
  • [1] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    NEUROCOMPUTING, 2021, 448 : 238 - 248
  • [2] SPATIOTEMPORAL ATTENTION BASED DEEP NEURAL NETWORKS FOR EMOTION RECOGNITION
    Lee, Jiyoung
    Kim, Sunok
    Kim, Seungryong
    Sohn, Kwanghoon
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1513 - 1517
  • [3] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [4] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 595 - 604
  • [5] Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions
    Yang Liu
    Haoqin Sun
    Wenbo Guan
    Yuqi Xia
    Zhen Zhao
    Machine Intelligence Research, 2023, 20 : 595 - 604
  • [6] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)
  • [7] Unifying spatiotemporal and frequential attention for traffic prediction
    Guo, Qi
    Tan, Qi
    Tang, Jun
    Shi, Benyun
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [8] AUTOMATIC SPEECH EMOTION RECOGNITION USING RECURRENT NEURAL NETWORKS WITH LOCAL ATTENTION
    Mirsamadi, Seyedmahdad
    Barsoum, Emad
    Zhang, Cha
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2227 - 2231
  • [9] Hierarchical convolutional neural networks with post-attention for speech emotion recognition
    Fan, Yonghong
    Huang, Heming
    Han, Henry
    NEUROCOMPUTING, 2025, 615
  • [10] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
    Zhang, Zhihao
    Wang, Kunxia
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,