A Waveform-Feature Dual Branch Acoustic Embedding Network for Emotion Recognition

被引：3

作者：

Li, Jeng-Lin ^{[1
,2
]}

Huang, Tzu-Yun ^{[1
,2
]}

Chang, Chun-Min ^{[1
,2
]}

Lee, Chi-Chun ^{[1
,2
]}

机构：

[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan

[2] MOST Joint Res Ctr Al Technol & All Vista Hlthca, Taipei, Taiwan

来源：

FRONTIERS IN COMPUTER SCIENCE | 2020年 / 2卷

关键词：

speech emotion recognition; raw waveform; end-to-end; complementary learning; acoustic representation; SPEECH; IDENTIFICATION; MODEL;

D O I：

10.3389/fcomp.2020.00013

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Research in advancing speech emotion recognition (SER) has attracted a lot of attention due to its critical role for better human behaviors understanding scientifically and comprehensive applications commercially. Conventionally, performing SER highly relies on hand-crafted acoustic features. The recent progress in deep learning has attempted to model emotion directly from raw waveform in an end-to-end learning scheme; however, this particular approach remains to be generally a sub-optimal approach. An alternative direction has been proposed to enhance and augment the knowledge-based acoustic representation with affect-related representation derived directly from raw waveform. Here, we propose a complimentary waveform-feature dual branch learning network, termed as Dual-Complementary Acoustic Embedding Network (DCaEN), to effectively integrate psychoacoustic knowledge and raw waveform embedding within an augmented feature space learning approach. DCaEN contains an acoustic feature embedding network and a raw waveform network, that is learned by integrating negative cosine distance constraint in the loss function. The experiment results show that DCaEN can achieve 59.31 an 46.73% unweighted average recall (UAR) in the USC IEMOCAP and the MSP-IMPROV speech emotion databases, which improves the performance compared to modeling either acoustic hand-crafted features or raw waveform only and without this particular loss constraint. Further analysis illustrates a reverse mirroring pattern in the learned latent space demonstrating the complementary nature of DCaEN feature space learning.

引用

页数：13

共 50 条

[21] Acoustic Feature Excitation-and-Aggregation Network Based on Multi-Task Learning for Speech Emotion Recognition
Qi, Xin
Song, Qing
Chen, Guowei
Zhang, Pengzhou
Fu, Yao
ELECTRONICS, 2025, 14 (05):
[22] Bi-Branch Vision Transformer Network for EEG Emotion Recognition
Lu, Wei
Tan, Tien-Ping
Ma, Hua
IEEE ACCESS, 2023, 11 : 36233 - 36243
[23] Abnormal Fastener Recognition via Dual-Branch Supervised Contrastive Learning Network With Hard Feature Synthesis
Wang, Jianzhu
Wu, Jianqing
Wang, Shengchun
Zhao, Xinxin
Li, Qingyong
IEEE SENSORS JOURNAL, 2024, 24 (18) : 29365 - 29376
[24] MSDSANet: Multimodal Emotion Recognition Based on Multi-Stream Network and Dual-Scale Attention Network Feature Representation
Sun, Weitong
Yan, Xingya
Su, Yuping
Wang, Gaihua
Zhang, Yumei
SENSORS, 2025, 25 (07)
[25] Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
Pulatov, Ilkhomjon
Oteniyazov, Rashid
Makhmudov, Fazliddin
Cho, Young-Im
SENSORS, 2023, 23 (14)
[26] Emotion recognition with attention mechanism-guided dual-feature multi-path interaction network
Li, Yaxuan
Guo, Wenhui
Wang, Yanjiang
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 617 - 626
[27] Dual-branch network based on transformer for texture recognition
Liu, Yangqi
Dong, Hao
Wang, Guodong
Chen, Chenglizhao
DIGITAL SIGNAL PROCESSING, 2024, 153
[28] DBMF: Dual Branch Multiscale Feature Fusion Network for polyp segmentation
Liu, Fangjin
Hua, Zhen
Li, Jinjiang
Fan, Linwei
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 151
[29] Improving Spontaneous Children's Emotion Recognition by Acoustic Feature Selection and Feature-Level Fusion of Acoustic and Linguistic Parameters
Planet, Santiago
Iriondo, Ignasi
ADVANCES IN NONLINEAR SPEECH PROCESSING, 2011, 7015 : 88 - 95
[30] Vehicle Detection Algorithm Based on Dual Branch Feature Aggregation Network
Lyu, Meng
Mao, Shenghui
Chai, Liang
Gao, Pengfei
Shi, Lei
Computer Engineering and Applications, 2024, 60 (22) : 240 - 250

← 1 2 3 4 5 →