Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition

被引:0
|
作者
Md Shah Fahad
Ashish Ranjan
Akshay Deepak
Gayadhar Pradhan
机构
[1] National Institute of Technology Patna,Department of Computer Science and Engineering
[2] Vellore Institute of Technology,School of Computing Science and Engineering
[3] Siksha ‘O’ Anusandhan (Deemed to be University),Department of Computer Science and Engineering
[4] National Institute of Technology Patna,Department of Electronics and Communication
关键词
Speech emotion recognition; Gradient reversal layer (GRL); Domain adversarial neural network (DANN); Speaker adversarial neural network (SANN); Speaker-independent; Speaker-invariant.;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, domain adversarial neural networks (DANN) have delivered promising results for out of domain data. This paper exploits DANN for speaker independent emotion recognition, where the domain corresponds to speakers, i.e. the training and testing datasets contain different speakers. The result is a speaker adversarial neural network (SANN). The proposed SANN is used for extracting speaker-invariant and emotion-specific discriminative features for the task of speech emotion recognition. To extract speaker-invariant features, multi-tasking adversarial training of a deep neural network (DNN) is employed. The DNN framework consists of two sub-networks: one for emotion classification (primary task) and the other for speaker classification (secondary task). The gradient reversal layer (GRL) was introduced between (a) the layer common to both the primary and auxiliary classifiers and (b) the auxiliary classifier. The objective of the GRL layer is to reduce the variance among speakers by maximizing the speaker classification loss. The proposed framework jointly optimizes the above two sub-networks to minimize the emotion classification loss and mini-maximize the speaker classification loss. The proposed network was evaluated on the IEMOCAP and EMODB datasets. A total of 1582 features were extracted from the standard library openSMILE. A subset of these features was eventually selected using a genetic algorithm approach. On the IEMOCAP dataset, the proposed SANN model achieved relative improvements of +6.025% (weighted accuracy) and +5.62% (unweighted accuracy) over the baseline system. Similar results were observed for the EMODB dataset. Further, in spite of differences with respect to models and features with state-of-the-art methods, significant improvement in accuracy values was also obtained over them.
引用
收藏
页码:6113 / 6135
页数:22
相关论文
共 50 条
  • [41] Speaker-independent recognition of Chinese tones
    GUAN Cuntai and CHEN Yongbin(Dep. of Radio Eng.
    [J]. Chinese Journal of Acoustics, 1993, (02) : 142 - 148
  • [42] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
    Hariharan, R
    Viikki, O
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
  • [43] Speaker independent speech emotion recognition by ensemble classification
    Schuller, B
    Reiter, S
    Müller, R
    Al-Hames, M
    Lang, M
    Rigoll, G
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 865 - 868
  • [44] SPEAKER-INDEPENDENT DIGIT RECOGNITION SYSTEM
    SAMBUR, MR
    RABINER, LR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S26 - S26
  • [45] A speaker-independent continuous speech recognition system using biomimetic pattern recognition
    Wang Shoujue
    Qin Hong
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (03) : 460 - 462
  • [46] SPEAKER-INDEPENDENT DIGIT RECOGNITION USING A NEURAL NETWORK WITH TIME-DELAYED CONNECTIONS
    UNNIKRISHNAN, KP
    HOPFIELD, JJ
    TANK, DW
    [J]. NEURAL COMPUTATION, 1992, 4 (01) : 108 - 119
  • [47] DSP-based large vocabulary speaker-independent speech recognition
    Hirayama, H
    Yoshida, K
    Koga, S
    Hattori, H
    [J]. NEC RESEARCH & DEVELOPMENT, 1996, 37 (04): : 528 - 534
  • [48] SPEAKER-INDEPENDENT SPEECH-RECOGNITION SYSTEM BASED ON LINEAR PREDICTION
    GUPTA, VN
    BRYAN, JK
    GOWDY, JN
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (01): : 27 - 33
  • [49] A HMM-based integrated method for speaker-independent speech recognition
    Zhang, YY
    Zhu, XY
    [J]. ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 613 - 616
  • [50] SPEAKER-INDEPENDENT SPEECH RECOGNITION UNIT DEVELOPMENT FOR TELEPHONE LINE USE
    ISHII, N
    IMAI, Y
    NAKATSU, R
    ANDO, M
    [J]. JAPAN TELECOMMUNICATIONS REVIEW, 1982, 24 (03): : 267 - 274