A Waveform-Feature Dual Branch Acoustic Embedding Network for Emotion Recognition

被引:3
|
作者
Li, Jeng-Lin [1 ,2 ]
Huang, Tzu-Yun [1 ,2 ]
Chang, Chun-Min [1 ,2 ]
Lee, Chi-Chun [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
[2] MOST Joint Res Ctr Al Technol & All Vista Hlthca, Taipei, Taiwan
来源
FRONTIERS IN COMPUTER SCIENCE | 2020年 / 2卷
关键词
speech emotion recognition; raw waveform; end-to-end; complementary learning; acoustic representation; SPEECH; IDENTIFICATION; MODEL;
D O I
10.3389/fcomp.2020.00013
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research in advancing speech emotion recognition (SER) has attracted a lot of attention due to its critical role for better human behaviors understanding scientifically and comprehensive applications commercially. Conventionally, performing SER highly relies on hand-crafted acoustic features. The recent progress in deep learning has attempted to model emotion directly from raw waveform in an end-to-end learning scheme; however, this particular approach remains to be generally a sub-optimal approach. An alternative direction has been proposed to enhance and augment the knowledge-based acoustic representation with affect-related representation derived directly from raw waveform. Here, we propose a complimentary waveform-feature dual branch learning network, termed as Dual-Complementary Acoustic Embedding Network (DCaEN), to effectively integrate psychoacoustic knowledge and raw waveform embedding within an augmented feature space learning approach. DCaEN contains an acoustic feature embedding network and a raw waveform network, that is learned by integrating negative cosine distance constraint in the loss function. The experiment results show that DCaEN can achieve 59.31 an 46.73% unweighted average recall (UAR) in the USC IEMOCAP and the MSP-IMPROV speech emotion databases, which improves the performance compared to modeling either acoustic hand-crafted features or raw waveform only and without this particular loss constraint. Further analysis illustrates a reverse mirroring pattern in the learned latent space demonstrating the complementary nature of DCaEN feature space learning.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Acoustic Feature Excitation-and-Aggregation Network Based on Multi-Task Learning for Speech Emotion Recognition
    Qi, Xin
    Song, Qing
    Chen, Guowei
    Zhang, Pengzhou
    Fu, Yao
    ELECTRONICS, 2025, 14 (05):
  • [22] Bi-Branch Vision Transformer Network for EEG Emotion Recognition
    Lu, Wei
    Tan, Tien-Ping
    Ma, Hua
    IEEE ACCESS, 2023, 11 : 36233 - 36243
  • [23] Abnormal Fastener Recognition via Dual-Branch Supervised Contrastive Learning Network With Hard Feature Synthesis
    Wang, Jianzhu
    Wu, Jianqing
    Wang, Shengchun
    Zhao, Xinxin
    Li, Qingyong
    IEEE SENSORS JOURNAL, 2024, 24 (18) : 29365 - 29376
  • [24] MSDSANet: Multimodal Emotion Recognition Based on Multi-Stream Network and Dual-Scale Attention Network Feature Representation
    Sun, Weitong
    Yan, Xingya
    Su, Yuping
    Wang, Gaihua
    Zhang, Yumei
    SENSORS, 2025, 25 (07)
  • [25] Enhancing Speech Emotion Recognition Using Dual Feature Extraction Encoders
    Pulatov, Ilkhomjon
    Oteniyazov, Rashid
    Makhmudov, Fazliddin
    Cho, Young-Im
    SENSORS, 2023, 23 (14)
  • [26] Emotion recognition with attention mechanism-guided dual-feature multi-path interaction network
    Li, Yaxuan
    Guo, Wenhui
    Wang, Yanjiang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 617 - 626
  • [27] Dual-branch network based on transformer for texture recognition
    Liu, Yangqi
    Dong, Hao
    Wang, Guodong
    Chen, Chenglizhao
    DIGITAL SIGNAL PROCESSING, 2024, 153
  • [28] DBMF: Dual Branch Multiscale Feature Fusion Network for polyp segmentation
    Liu, Fangjin
    Hua, Zhen
    Li, Jinjiang
    Fan, Linwei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 151
  • [29] Improving Spontaneous Children's Emotion Recognition by Acoustic Feature Selection and Feature-Level Fusion of Acoustic and Linguistic Parameters
    Planet, Santiago
    Iriondo, Ignasi
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2011, 7015 : 88 - 95
  • [30] Vehicle Detection Algorithm Based on Dual Branch Feature Aggregation Network
    Lyu, Meng
    Mao, Shenghui
    Chai, Liang
    Gao, Pengfei
    Shi, Lei
    Computer Engineering and Applications, 2024, 60 (22) : 240 - 250