Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network

被引:18
|
作者
Liu, Jiateng [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
Lu, Cheng [2 ]
Tang, Chuangao [1 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-corpus speech emotion recognition; deep convolutional neural network; domain adaptation;
D O I
10.1587/transinf.2019EDL8136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.
引用
收藏
页码:459 / 463
页数:5
相关论文
共 50 条
  • [21] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Cao, Xuan
    Jia, Maoshen
    Ru, Jiawei
    Pai, Tun-wen
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [22] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
    Zhang, Jiacheng
    Jiang, Lin
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794
  • [23] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
    Ahn, Youngdo
    Lee, Sung Joo
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
  • [24] Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
    Zong, Yuan
    Lian, Hailun
    Zhang, Jiacheng
    Feng, Ercui
    Lu, Cheng
    Chang, Hongli
    Tang, Chuangao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [25] Optimized cross-corpus speech emotion recognition framework based on normalized 1D convolutional neural network with data augmentation and feature selection
    Barsainyan N.
    Singh D.K.
    International Journal of Speech Technology, 2023, 26 (04) : 947 - 961
  • [26] Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
    Gideon, John
    McInnis, Melvin G.
    Provost, Emily Mower
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (04) : 1055 - 1068
  • [27] Cross-Corpus Speech Emotion Recognition Based on Sparse Subspace Transfer Learning
    Zhao, Keke
    Song, Peng
    Zhang, Wenjing
    Zhang, Weijian
    Li, Shaokai
    Chen, Dongliang
    Zheng, Wenming
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 466 - 473
  • [28] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [29] Auditory attention model based on Chirplet for cross-corpus speech emotion recognition
    Zhang X.
    Song P.
    Zha C.
    Tao H.
    Zhao L.
    Zhao, Li (zhaoli@seu.edu.cn), 1600, Southeast University (32): : 402 - 407
  • [30] Multi-scale discrepancy adversarial network for cross-corpus speech emotion recognition
    Wanlu ZHENG
    Wenming ZHENG
    Yuan ZONG
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 65 - 75