Spatiotemporal and frequential cascaded attention networks for speech emotion recognition

被引:0
|
作者
Li, Shuzhen [1 ]
Xing, Xiaofen [2 ]
Fan, Weiquan [2 ]
Cai, Bolun [3 ]
Fordson, Perry [2 ]
Xu, Xiangmin [2 ]
机构
[1] School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
[2] School of Electronic and Information Engineering & UBTECH-SCUT Joint Research Lab, South China University of Technology, Guangzhou, China
[3] Tencent WeChat AI, Guangzhou, China
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Speech emotion recognition is an important but difficult task in human–computer interaction systems. One of the main challenges in speech emotion recognition is how to extract effective emotion features from a long utterance. To address this issue, we propose a novel spatiotemporal and frequential cascaded attention network with large-margin learning in this paper. Spatiotemporal attention selectively locates the targeted emotional regions from a long speech spectrogram. In these targeted regions, frequential attention captures the emotional features by frequency distribution. The cascaded attention assists the neural network to gradually extract effective emotion features from the long spectrogram. During training, large-margin learning is applied to improve intra-class compactness and enlarge inter-class distances. Experiments on four public datasets demonstrate that our proposed model achieves a promising performance in speech emotion recognition. © 2021
引用
收藏
页码:238 / 248
相关论文
共 50 条
  • [41] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [42] Speech emotion recognition with embedded attention mechanism and hierarchical context
    Cheng Y.
    Chen Y.
    Chen Y.
    Yang Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
  • [43] EFFECTIVE ATTENTION MECHANISM IN DYNAMIC MODELS FOR SPEECH EMOTION RECOGNITION
    Hsiao, Po-Wei
    Chen, Chia-Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2526 - 2530
  • [44] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
  • [45] Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
    Gao, Miao
    Yang, Chun
    Zhou, Fang
    Yin, Xu-cheng
    INTERSPEECH 2019, 2019, : 3930 - 3934
  • [46] Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
    Xu, Mingke
    Zhang, Fan
    Khan, Samee U.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 1058 - 1064
  • [48] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
    He, Jingru
    Ren, Liyong
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
  • [49] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
    Nam, Youngja
    Lee, Chankyu
    SENSORS, 2021, 21 (13)
  • [50] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59