SMOOTHING MODEL PREDICTIONS USING ADVERSARIAL TRAINING PROCEDURES FOR SPEECH BASED EMOTION RECOGNITION

被引：0

作者：

Sahu, Saurabh ^{[1
]}

Gupta, Rahul ^{[2
]}

Sivaraman, Ganesh ^{[1
]}

Espy-Wilson, Carol ^{[1
]}

机构：

[1] Univ Maryland, Speech Commun Lab, College Pk, MD 20742 USA

[2] Amazon Com, Seattle, WA USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Adversarial training; Manifold regularization; Speech emotion recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Training discriminative classifiers involves learning a conditional distribution p(y(i)vertical bar x(i)), given a set of feature vectors x(i) and the corresponding labels y(i), i = 1..N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(y(i)vertical bar x(i)) is desired to be smoothly varying over the inputs x(i). Adversarial training procedures enforce this smoothness using manifold regularization techniques. Manifold regularization makes the model's output distribution more robust to local perturbation added to a datapoint x(i). In this paper, we experiment with the application of adversarial training procedures to increase the accuracy of a deep neural network based emotion recognition system using speech cues. Specifically, we investigate two training procedures: (i) adversarial training where we determine the adversarial direction based on the given labels for the training data and, (ii) virtual adversarial training where we determine the adversarial direction based only on the output distribution of the training data. We demonstrate the efficacy of adversarial training procedures by performing a k-fold cross validation experiment on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and a cross-corpus performance analysis on three separate corpora. Results show improvement over a purely supervised approach, as well as better generalization capability to cross-corpus settings.

引用

页码：4934 / 4938

页数：5

共 50 条

[1] Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity
Kuwahara, Toyoaki
Orihara, Ryohei
Sei, Yuichi
Tahara, Yasuyuki
Ohsuga, Akihiko
ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 570 - 577
[2] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
Sahu, Saurabh
Gupta, Rahul
Espy-Wilson, Carol
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
[3] Adversarial Auto-encoders for Speech Based Emotion Recognition
Sahu, Saurabh
Gupta, Rahul
Sivaraman, Ganesh
AbdAlmageed, Wael
Espy-Wilson, Carol
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1243 - 1247
[4] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
Atmaja, Bagus Tris
Akagi, Masato
2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
[5] A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition
Zhang, Zhaohang
Zhang, Xiaohui
Guo, Min
Zhang, Wei-Qiang
Li, Ke
Huang, Yukai
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 750 - 755
[6] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
[7] An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization
Fu, Changzeng
Liu, Chaoran
Ishi, Carlos Toshinori
Ishiguro, Hiroshi
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2361 - 2374
[8] Adversarial Data Augmentation Network for Speech Emotion Recognition
Yi, Lu
Mak, Man-Wai
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
[9] Augmenting Generative Adversarial Networks for Speech Emotion Recognition
Latif, Siddique
Asim, Muhammad
Rana, Rajib
Khalifa, Sara
Jurdak, Raja
Schuller, Bjoern W.
INTERSPEECH 2020, 2020, : 521 - 525
[10] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
Cho, Sunyoung
Yoon, Soosung
Song, Hyunseung
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970

← 1 2 3 4 5 →