Spatiotemporal and frequential cascaded attention networks for speech emotion recognition

被引:0
|
作者
Li, Shuzhen [1 ]
Xing, Xiaofen [2 ]
Fan, Weiquan [2 ]
Cai, Bolun [3 ]
Fordson, Perry [2 ]
Xu, Xiangmin [2 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
[2] School of Electronic and Information Engineering & UBTECH-SCUT Joint Research Lab, South China University of Technology, Guangzhou, China
[3] Tencent WeChat AI, Guangzhou, China
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Speech emotion recognition is an important but difficult task in human–computer interaction systems. One of the main challenges in speech emotion recognition is how to extract effective emotion features from a long utterance. To address this issue, we propose a novel spatiotemporal and frequential cascaded attention network with large-margin learning in this paper. Spatiotemporal attention selectively locates the targeted emotional regions from a long speech spectrogram. In these targeted regions, frequential attention captures the emotional features by frequency distribution. The cascaded attention assists the neural network to gradually extract effective emotion features from the long spectrogram. During training, large-margin learning is applied to improve intra-class compactness and enlarge inter-class distances. Experiments on four public datasets demonstrate that our proposed model achieves a promising performance in speech emotion recognition. © 2021
引用
收藏
页码:238 / 248
相关论文
共 50 条
  • [31] Correlated Attention Networks for Multimodal Emotion Recognition
    Qiu, Jie-Lin
    Li, Xiao-Yu
    Hu, Kai
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2656 - 2660
  • [32] SPEECH EMOTION RECOGNITION USING CAPSULE NETWORKS
    Wu, Xixin
    Liu, Songxiang
    Cao, Yuewen
    Li, Xu
    Yu, Jianwei
    Dai, Dongyang
    Ma, Xi
    Hu, Shoukang
    Wu, Zhiyong
    Liu, Xunying
    Meng, Helen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6695 - 6699
  • [33] Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks
    Teixeira, Thomas
    Granger, Eric
    Lameiras Koerich, Alessandro
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [34] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [35] 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition
    Chen, Mingyi
    He, Xuanji
    Yang, Jing
    Zhang, Han
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) : 1440 - 1444
  • [36] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [37] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [38] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [39] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [40] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323