Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition

被引:4
|
作者
Zheng, Wanlu [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
机构
[1] Zheng, Wanlu
[2] Zheng, Wenming
[3] Zong, Yuan
来源
Zheng, Wenming (wenming_zheng@seu.edu.cn) | 1600年 / KeAi Communications Co.卷 / 03期
关键词
Adversarial networks - Critical issues - Cross-corpus speech emotion recognition - Domain adaptation - Hierarchical discriminator - Multi-scales - Recognizing Human Emotion - Speech emotion recognition - Testing data - Training data;
D O I
10.1016/j.vrih.2020.11.006
中图分类号
学科分类号
摘要
Background: One of the most critical issues in human-computer interaction applications is recognizing human emotions based on speech. In recent years, the challenging problem of cross-corpus speech emotion recognition (SER) has generated extensive research. Nevertheless, the domain discrepancy between training data and testing data remains a major challenge to achieving improved system performance. Methods: This paper introduces a novel multi-scale discrepancy adversarial (MSDA) network for conducting multiple timescales domain adaptation for cross-corpus SER, i.e.,integrating domain discriminators of hierarchical levels into the emotion recognition framework to mitigate the gap between the source and target domains. Specifically, we extract two kinds of speech features, i.e., handcraft features and deep features, from three timescales of global, local, and hybrid levels. In each timescale, the domain discriminator and the emotion classifier compete against each other to learn features that minimize the discrepancy between the two domains by fooling the discriminator. Results: Extensive experiments on cross-corpus and cross-language SER were conducted on a combination dataset that combines one Chinese dataset and two English datasets commonly used in SER. The MSDA is affected by the strong discriminate power provided by the adversarial process, where three discriminators are working in tandem with an emotion classifier. Accordingly, the MSDA achieves the best performance over all other baseline methods. Conclusions: The proposed architecture was tested on a combination of one Chinese and two English datasets. The experimental results demonstrate the superiority of our powerful discriminative model for solving cross-corpus SER. © 2019 Beijing Zhongke Journal Publishing Co. Ltd
引用
收藏
页码:65 / 75
相关论文
共 50 条
  • [41] Augmenting Generative Adversarial Networks for Speech Emotion Recognition
    Latif, Siddique
    Asim, Muhammad
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjoern W.
    [J]. INTERSPEECH 2020, 2020, : 521 - 525
  • [42] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
    Cho, Sunyoung
    Yoon, Soosung
    Song, Hyunseung
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970
  • [43] Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition
    Pang, Miaoqi
    Wang, Hongtao
    Huang, Jiayang
    Vong, Chi-Man
    Zeng, Zhiqiang
    Chen, Chuangquan
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 1637 - 1646
  • [44] Multi-scale 3D-CRU for EEG emotion recognition
    Dong, Hao
    Zhou, Jian
    Fan, Cunhang
    Zheng, Wenming
    Tao, Liang
    Kwan, Hon Keung
    [J]. BIOMEDICAL PHYSICS & ENGINEERING EXPRESS, 2024, 10 (04):
  • [45] Multi-Scale Receptive Field Graph Model for Emotion Recognition in Conversations
    Wei, Jie
    Hu, Guanyu
    Tuan, Luu Anh
    Yang, Xinyu
    Zhu, Wenjing
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023, 2023-June
  • [46] Simultaneously exploring multi-scale and asymmetric EEG features for emotion recognition
    Wu, Yihan
    Xia, Min
    Nie, Li
    Zhang, Yangsong
    Fan, Andong
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [47] Semantic Adversarial Network with Multi-Scale Pyramid Attention for Video Classification
    Xie, De
    Deng, Cheng
    Wang, Hao
    Li, Chao
    Tao, Dapeng
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9030 - 9037
  • [48] DBAN: Adversarial Network With Multi-Scale Features for Cardiac MRI Segmentation
    Yang, Xinyu
    Zhang, Yuan
    Lo, Benny
    Wu, Dongrui
    Liao, Hongen
    Zhang, Yuan-Ting
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2018 - 2028
  • [49] Wavelet Domain Generative Adversarial Network for Multi-scale Face Hallucination
    Huang, Huaibo
    He, Ran
    Sun, Zhenan
    Tan, Tieniu
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) : 763 - 784
  • [50] A Fast Multi-Scale Generative Adversarial Network for Image Compressed Sensing
    Li, Wenzong
    Zhu, Aichun
    Xu, Yonggang
    Yin, Hongsheng
    Hua, Gang
    [J]. ENTROPY, 2022, 24 (06)