Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition

被引:4
|
作者
Zheng, Wanlu [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
机构
[1] Zheng, Wanlu
[2] Zheng, Wenming
[3] Zong, Yuan
来源
Zheng, Wenming (wenming_zheng@seu.edu.cn) | 1600年 / KeAi Communications Co.卷 / 03期
关键词
Adversarial networks - Critical issues - Cross-corpus speech emotion recognition - Domain adaptation - Hierarchical discriminator - Multi-scales - Recognizing Human Emotion - Speech emotion recognition - Testing data - Training data;
D O I
10.1016/j.vrih.2020.11.006
中图分类号
学科分类号
摘要
Background: One of the most critical issues in human-computer interaction applications is recognizing human emotions based on speech. In recent years, the challenging problem of cross-corpus speech emotion recognition (SER) has generated extensive research. Nevertheless, the domain discrepancy between training data and testing data remains a major challenge to achieving improved system performance. Methods: This paper introduces a novel multi-scale discrepancy adversarial (MSDA) network for conducting multiple timescales domain adaptation for cross-corpus SER, i.e.,integrating domain discriminators of hierarchical levels into the emotion recognition framework to mitigate the gap between the source and target domains. Specifically, we extract two kinds of speech features, i.e., handcraft features and deep features, from three timescales of global, local, and hybrid levels. In each timescale, the domain discriminator and the emotion classifier compete against each other to learn features that minimize the discrepancy between the two domains by fooling the discriminator. Results: Extensive experiments on cross-corpus and cross-language SER were conducted on a combination dataset that combines one Chinese dataset and two English datasets commonly used in SER. The MSDA is affected by the strong discriminate power provided by the adversarial process, where three discriminators are working in tandem with an emotion classifier. Accordingly, the MSDA achieves the best performance over all other baseline methods. Conclusions: The proposed architecture was tested on a combination of one Chinese and two English datasets. The experimental results demonstrate the superiority of our powerful discriminative model for solving cross-corpus SER. © 2019 Beijing Zhongke Journal Publishing Co. Ltd
引用
收藏
页码:65 / 75
相关论文
共 50 条
  • [1] Multi-scale discrepancy adversarial network for cross-corpus speech emotion recognition
    Wanlu ZHENG
    Wenming ZHENG
    Yuan ZONG
    [J]. 虚拟现实与智能硬件(中英文), 2021, 3 (01) : 65 - 75
  • [2] A Lightweight Multi-Scale Model for Speech Emotion Recognition
    Li, Haoming
    Zhao, Daqi
    Wang, Jingwen
    Wang, Deqiang
    [J]. IEEE ACCESS, 2024, 12 : 130228 - 130240
  • [3] A Multi-scale Fusion Framework for Bimodal Speech Emotion Recognition
    Chen, Ming
    Zhao, Xudong
    [J]. INTERSPEECH 2020, 2020, : 374 - 378
  • [4] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024
  • [5] GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition*
    Ye, Jia-Xin
    Wen, Xin-Cheng
    Wang, Xuan-Ze
    Xu, Yong
    Luo, Yan
    Wu, Chang-Li
    Chen, Li-Yan
    Liu, Kun-Hong
    [J]. SPEECH COMMUNICATION, 2022, 145 : 21 - 35
  • [6] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [7] Learning multi-scale features for speech emotion recognition with connection attention mechanism
    Chen, Zengzhao
    Li, Jiawen
    Liu, Hai
    Wang, Xuyang
    Wang, Hu
    Zheng, Qiuyu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [8] Multi-scale Generative Adversarial Networks for Speech Enhancement
    Li, Yihang
    Jiang, Ting
    Qin, Shan
    [J]. 2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [9] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
  • [10] SPEECH EMOTION RECOGNITION WITH GLOBAL-AWARE FUSION ON MULTI-SCALE FEATURE REPRESENTATION
    Zhu, Wenjing
    Li, Xiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6437 - 6441