Speech Emotion Recognition via Multi-Level Attention Network

被引:8
|
作者
Liu, Ke [1 ]
Wang, Dekui [1 ]
Wu, Dongya [1 ]
Liu, Yutao [1 ]
Feng, Jun [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China
基金
中国国家自然科学基金;
关键词
MFCC; multi-scale feature; attention mechanism; speech emotion recognition; NEURAL-NETWORK; FEATURES;
D O I
10.1109/LSP.2022.3219352
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Aiming to improve the performance of human speech emotion recognition (SER), the existing work has made great progress based on the popular mel-scale frequency cepstral coefficient (MFCC). However, the existing work rarely pays attention to the low-level emotion related features in MFCC, such as the underlying interactive relations. In this letter, we propose a novel multi-level attention network (MLAnet), which contains a multi-scale low-level feature (MLF) extractor and a multi-unit attention (MUA) module. Within the MLF extractor, we minimize the task-irrelevant information which harms the performance of SER by applying the attention mechanism. Since the features extracted by the MLF extractor contain rich domain-specific emotion information, we further present a MUA module to simultaneously weight the features in terms of time, frequency and channel dimensions. In this way, the discriminative emotion features in different dimensions can be extracted by corresponding weighting blocks. Experimental results on two benchmark datasets demonstrate that the proposed method outperforms other state-of-the-art approaches.
引用
收藏
页码:2278 / 2282
页数:5
相关论文
共 50 条
  • [1] Multi-level attention fusion network assisted by relative entropy alignment for multimodal speech emotion recognition
    Lei, Jianjun
    Wang, Jing
    Wang, Ying
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8478 - 8490
  • [2] Concept-guided multi-level attention network for image emotion recognition
    Yang, Hansen
    Fan, Yangyu
    Lv, Guoyun
    Liu, Shiya
    Guo, Zhe
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4313 - 4326
  • [3] SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
    Zou, Heqing
    Si, Yuke
    Chen, Chen
    Rajan, Deepu
    Chng, Eng Siong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7367 - 7371
  • [4] Knowledge enhancement for speech emotion recognition via multi-level acoustic feature
    Zhao, Huan
    Huang, Nianxin
    Chen, Haijiao
    [J]. CONNECTION SCIENCE, 2024, 36 (01)
  • [5] Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Li, Ruichen
    Zhao, Jinming
    Jin, Qin
    [J]. INTERSPEECH 2021, 2021, : 4488 - 4492
  • [6] An Ensemble Model for Multi-Level Speech Emotion Recognition
    Zheng, Chunjun
    Wang, Chunli
    Jia, Ning
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (01):
  • [7] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
    Ngoc-Huynh Ho
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    [J]. IEEE ACCESS, 2020, 8 : 61672 - 61686
  • [8] Low-Order Multi-Level Features for Speech Emotion Recognition
    Tamulevicius, Gintautas
    Liogiene, Tatjana
    [J]. BALTIC JOURNAL OF MODERN COMPUTING, 2015, 3 (04): : 234 - 247
  • [9] Image Emotion Recognition via Fusion Multi-Level Representations
    Zhang, Hao
    Li, Haipeng
    Peng, Guoqin
    Liu, Yan'an
    Xu, Dan
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (10): : 1566 - 1576
  • [10] MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis
    Xu, Liwen
    Wang, Zhengtao
    Wu, Bin
    Lui, Simon
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9469 - 9478