Speech Emotion Recognition via Multi-Level Attention Network

被引：8

作者：

Liu, Ke ^{[1
]}

Wang, Dekui ^{[1
]}

Wu, Dongya ^{[1
]}

Liu, Yutao ^{[1
]}

Feng, Jun ^{[1
]}

机构：

[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710127, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

中国国家自然科学基金;

关键词：

MFCC; multi-scale feature; attention mechanism; speech emotion recognition; NEURAL-NETWORK; FEATURES;

D O I：

10.1109/LSP.2022.3219352

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Aiming to improve the performance of human speech emotion recognition (SER), the existing work has made great progress based on the popular mel-scale frequency cepstral coefficient (MFCC). However, the existing work rarely pays attention to the low-level emotion related features in MFCC, such as the underlying interactive relations. In this letter, we propose a novel multi-level attention network (MLAnet), which contains a multi-scale low-level feature (MLF) extractor and a multi-unit attention (MUA) module. Within the MLF extractor, we minimize the task-irrelevant information which harms the performance of SER by applying the attention mechanism. Since the features extracted by the MLF extractor contain rich domain-specific emotion information, we further present a MUA module to simultaneously weight the features in terms of time, frequency and channel dimensions. In this way, the discriminative emotion features in different dimensions can be extracted by corresponding weighting blocks. Experimental results on two benchmark datasets demonstrate that the proposed method outperforms other state-of-the-art approaches.

引用

页码：2278 / 2282

页数：5

共 50 条

[21] Multi-level channel attention excitation network for human action recognition in videos
Wu, Hanbo
Ma, Xin
Li, Yibin
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 114
[22] A new joint CTC-attention-based speech recognition model with multi-level multi-head attention
Chu-Xiong Qin
Wen-Lin Zhang
Dan Qu
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2019
[23] A Multi-level Classification Approach for Facial Emotion Recognition
Drume, Dev
Jalal, Anand Singh
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 288 - 292
[24] A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns
Sonmez, Yesim Ulgen
Varol, Asaf
[J]. IEEE ACCESS, 2020, 8 : 190784 - 190796
[25] Attention Based Fully Convolutional Network for Speech Emotion Recognition
Zhang, Yuanyuan
Du, Jun
Wang, Zirui
Zhang, Jianshu
Tu, Yanhui
[J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
[26] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
Hu, Ying
Hou, Shijing
Yang, Huamin
Huang, Hao
He, Liang
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
[27] DILATED RESIDUAL NETWORK WITH MULTI-HEAD SELF-ATTENTION FOR SPEECH EMOTION RECOGNITION
Li, Runnan
Wu, Zhiyong
Jia, Jia
Zhao, Sheng
Meng, Helen
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6675 - 6679
[28] Automatic Concrete Damage Recognition Using Multi-Level Attention Convolutional Neural Network
Shin, Hyun Kyu
Ahn, Yong Han
Lee, Sang Hyo
Kim, Ha Young
[J]. MATERIALS, 2020, 13 (23) : 1 - 13
[29] The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition
Glavatskih, Igor
Platonova, Tatyana
Rogozhina, Valeria
Shirokova, Anna
Smolina, Anna
Kotov, Mikhail
Ovsyannikova, Anna
Repalov, Sergey
Zulkarneev, Mikhail
[J]. SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 438 - 445
[30] Multi-Level Ensemble Network for Scene Recognition
Zhang, Longhao
Li, Lingqiao
Pan, Xipeng
Cao, Zhiwei
Chen, Qianyu
Yang, Huihua
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) : 28209 - 28230

← 1 2 3 4 5 →