Speech Emotion Recognition via Multi-Level Attention Network

被引:8
|
作者
Liu, Ke [1 ]
Wang, Dekui [1 ]
Wu, Dongya [1 ]
Liu, Yutao [1 ]
Feng, Jun [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China
基金
中国国家自然科学基金;
关键词
MFCC; multi-scale feature; attention mechanism; speech emotion recognition; NEURAL-NETWORK; FEATURES;
D O I
10.1109/LSP.2022.3219352
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Aiming to improve the performance of human speech emotion recognition (SER), the existing work has made great progress based on the popular mel-scale frequency cepstral coefficient (MFCC). However, the existing work rarely pays attention to the low-level emotion related features in MFCC, such as the underlying interactive relations. In this letter, we propose a novel multi-level attention network (MLAnet), which contains a multi-scale low-level feature (MLF) extractor and a multi-unit attention (MUA) module. Within the MLF extractor, we minimize the task-irrelevant information which harms the performance of SER by applying the attention mechanism. Since the features extracted by the MLF extractor contain rich domain-specific emotion information, we further present a MUA module to simultaneously weight the features in terms of time, frequency and channel dimensions. In this way, the discriminative emotion features in different dimensions can be extracted by corresponding weighting blocks. Experimental results on two benchmark datasets demonstrate that the proposed method outperforms other state-of-the-art approaches.
引用
收藏
页码:2278 / 2282
页数:5
相关论文
共 50 条
  • [21] Multi-level channel attention excitation network for human action recognition in videos
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 114
  • [22] A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
    Chu-Xiong Qin
    Wen-Lin Zhang
    Dan Qu
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [23] A Multi-level Classification Approach for Facial Emotion Recognition
    Drume, Dev
    Jalal, Anand Singh
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 288 - 292
  • [24] A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns
    Sonmez, Yesim Ulgen
    Varol, Asaf
    [J]. IEEE ACCESS, 2020, 8 : 190784 - 190796
  • [25] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [26] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [27] DILATED RESIDUAL NETWORK WITH MULTI-HEAD SELF-ATTENTION FOR SPEECH EMOTION RECOGNITION
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Zhao, Sheng
    Meng, Helen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6675 - 6679
  • [28] Automatic Concrete Damage Recognition Using Multi-Level Attention Convolutional Neural Network
    Shin, Hyun Kyu
    Ahn, Yong Han
    Lee, Sang Hyo
    Kim, Ha Young
    [J]. MATERIALS, 2020, 13 (23) : 1 - 13
  • [29] The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
    Glavatskih, Igor
    Platonova, Tatyana
    Rogozhina, Valeria
    Shirokova, Anna
    Smolina, Anna
    Kotov, Mikhail
    Ovsyannikova, Anna
    Repalov, Sergey
    Zulkarneev, Mikhail
    [J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 438 - 445
  • [30] Multi-Level Ensemble Network for Scene Recognition
    Zhang, Longhao
    Li, Lingqiao
    Pan, Xipeng
    Cao, Zhiwei
    Chen, Qianyu
    Yang, Huihua
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) : 28209 - 28230