Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition

被引：0

作者：

Md Shah Fahad

Ashish Ranjan

Akshay Deepak

Gayadhar Pradhan

机构：

[1] National Institute of Technology Patna,Department of Computer Science and Engineering

[2] Vellore Institute of Technology,School of Computing Science and Engineering

[3] Siksha ‘O’ Anusandhan (Deemed to be University),Department of Computer Science and Engineering

[4] National Institute of Technology Patna,Department of Electronics and Communication

来源：

Circuits, Systems, and Signal Processing | 2022年 / 41卷

关键词：

Speech emotion recognition; Gradient reversal layer (GRL); Domain adversarial neural network (DANN); Speaker adversarial neural network (SANN); Speaker-independent; Speaker-invariant.;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Recently, domain adversarial neural networks (DANN) have delivered promising results for out of domain data. This paper exploits DANN for speaker independent emotion recognition, where the domain corresponds to speakers, i.e. the training and testing datasets contain different speakers. The result is a speaker adversarial neural network (SANN). The proposed SANN is used for extracting speaker-invariant and emotion-specific discriminative features for the task of speech emotion recognition. To extract speaker-invariant features, multi-tasking adversarial training of a deep neural network (DNN) is employed. The DNN framework consists of two sub-networks: one for emotion classification (primary task) and the other for speaker classification (secondary task). The gradient reversal layer (GRL) was introduced between (a) the layer common to both the primary and auxiliary classifiers and (b) the auxiliary classifier. The objective of the GRL layer is to reduce the variance among speakers by maximizing the speaker classification loss. The proposed framework jointly optimizes the above two sub-networks to minimize the emotion classification loss and mini-maximize the speaker classification loss. The proposed network was evaluated on the IEMOCAP and EMODB datasets. A total of 1582 features were extracted from the standard library openSMILE. A subset of these features was eventually selected using a genetic algorithm approach. On the IEMOCAP dataset, the proposed SANN model achieved relative improvements of +6.025% (weighted accuracy) and +5.62% (unweighted accuracy) over the baseline system. Similar results were observed for the EMODB dataset. Further, in spite of differences with respect to models and features with state-of-the-art methods, significant improvement in accuracy values was also obtained over them.

引用

页码：6113 / 6135

页数：22

共 50 条

[1] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Fahad, Md Shah
Ranjan, Ashish
Deepak, Akshay
Pradhan, Gayadhar
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
[2] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
Huang, Xuedong
Lee, Kai-Fu
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
[3] A SPEAKER-INDEPENDENT SPEECH RECOGNITION SYSTEM FOR TELEPHONE NETWORK APPLICATIONS
TRNKA, R
[J]. REVUE TECHNIQUE THOMSON-CSF, 1984, 16 (04): : 847 - 861
[4] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Li, Yang
Tang, Chuangao
Schuller, Bjoern W.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230
[5] Speaker adaptation techniques for speech recognition with a speaker-independent phonetic recognizer
Kim, WG
Jang, M
[J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 95 - 100
[6] SPEAKER-INDEPENDENT VOWEL RECOGNITION IN PERSIAN SPEECH
Nazari, Mohammad
Sayadiyan, Abolghasem
Valiollahzadeh, Seyyed Majid
[J]. 2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 672 - 676
[7] SPEAKER-CONSISTENT PARSING FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
YAMAGUCHI, K
SINGER, H
MATSUNAGA, S
SAGAYAMA, S
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 719 - 724
[8] Japanese Speaker-Independent Homonyms Speech Recognition
Murakami, Jin'ichi
Hotta, Haseo
[J]. COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 306 - 313
[9] PREDICTOR CODEBOOK FOR SPEAKER-INDEPENDENT SPEECH RECOGNITION
KAWABATA, T
[J]. SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (01) : 37 - 46
[10] Biomimetic pattern recognition for speaker-independent speech recognition
Qin, H
Wang, SJ
Sun, H
[J]. PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1290 - 1294

← 1 2 3 4 5 →