SMOOTHING MODEL PREDICTIONS USING ADVERSARIAL TRAINING PROCEDURES FOR SPEECH BASED EMOTION RECOGNITION

被引:0
|
作者
Sahu, Saurabh [1 ]
Gupta, Rahul [2 ]
Sivaraman, Ganesh [1 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, Speech Commun Lab, College Pk, MD 20742 USA
[2] Amazon Com, Seattle, WA USA
关键词
Adversarial training; Manifold regularization; Speech emotion recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Training discriminative classifiers involves learning a conditional distribution p(y(i)vertical bar x(i)), given a set of feature vectors x(i) and the corresponding labels y(i), i = 1..N. For a classifier to be generalizable and not overfit to training data, the resulting conditional distribution p(y(i)vertical bar x(i)) is desired to be smoothly varying over the inputs x(i). Adversarial training procedures enforce this smoothness using manifold regularization techniques. Manifold regularization makes the model's output distribution more robust to local perturbation added to a datapoint x(i). In this paper, we experiment with the application of adversarial training procedures to increase the accuracy of a deep neural network based emotion recognition system using speech cues. Specifically, we investigate two training procedures: (i) adversarial training where we determine the adversarial direction based on the given labels for the training data and, (ii) virtual adversarial training where we determine the adversarial direction based only on the output distribution of the training data. We demonstrate the efficacy of adversarial training procedures by performing a k-fold cross validation experiment on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) and a cross-corpus performance analysis on three separate corpora. Results show improvement over a purely supervised approach, as well as better generalization capability to cross-corpus settings.
引用
收藏
页码:4934 / 4938
页数:5
相关论文
共 50 条
  • [1] Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity
    Kuwahara, Toyoaki
    Orihara, Ryohei
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 570 - 577
  • [2] On Enhancing Speech Emotion Recognition using Generative Adversarial Networks
    Sahu, Saurabh
    Gupta, Rahul
    Espy-Wilson, Carol
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3693 - 3697
  • [3] Adversarial Auto-encoders for Speech Based Emotion Recognition
    Sahu, Saurabh
    Gupta, Rahul
    Sivaraman, Ganesh
    AbdAlmageed, Wael
    Espy-Wilson, Carol
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1243 - 1247
  • [4] Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model
    Atmaja, Bagus Tris
    Akagi, Masato
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 40 - 44
  • [5] A Multilingual Framework Based on Pre-training Model for Speech Emotion Recognition
    Zhang, Zhaohang
    Zhang, Xiaohui
    Guo, Min
    Zhang, Wei-Qiang
    Li, Ke
    Huang, Yukai
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 750 - 755
  • [6] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
  • [7] An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2361 - 2374
  • [8] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [9] Augmenting Generative Adversarial Networks for Speech Emotion Recognition
    Latif, Siddique
    Asim, Muhammad
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjoern W.
    INTERSPEECH 2020, 2020, : 521 - 525
  • [10] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
    Cho, Sunyoung
    Yoon, Soosung
    Song, Hyunseung
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970