Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network

被引:4
|
作者
Cai, Xiong [1 ]
Wu, Zhiyong [1 ,2 ]
Zhong, Kuo [1 ]
Su, Bin [1 ]
Dai, Dongyang [1 ]
Meng, Helen [1 ,2 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
speech emotion recognition; domain adversarial learning; cross-lingual; affective representation learning;
D O I
10.1109/ISCSLP49672.2021.9362058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
By using deep learning approaches, Speech Emotion Recognition (SER) on a single domain has achieved many excellent results. However, cross-domain SER is still a challenging task due to the distribution shift between source and target domains. In this work, we propose a Domain Adversarial Neural Network (DANN) based approach to mitigate this distribution shift problem for cross-lingual SER. Specifically, we add a language classifier and gradient reversal layer after the feature extractor to force the learned representation both language-independent and emotion-meaningful. Our method is unsupervised, i. e., labels on target language are not required, which makes it easier to apply our method to other languages. Experimental results show the proposed method provides an average absolute improvement of 3.91% over the baseline system for arousal and valence classification task. Furthermore, we find that batch normalization is beneficial to the performance gain of DANN. Therefore we also explore the effect of different ways of data combination for batch normalization.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT
    Wang, Runchuan
    Zhang, Zhao
    Zhuang, Fuzhen
    Gao, Dehong
    Wei, Yi
    He, Qing
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3498 - 3502
  • [42] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +
  • [43] A Preliminary Study of Cross-lingual Emotion Recognition from Speech: Automatic Classification versus Human Perception
    Jeon, Je Hun
    Le, Duc
    Xia, Rui
    Liu, Yang
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2836 - 2839
  • [44] Cross-lingual Speech Emotion Recognition System Based on a Three-Layer Model for Human Perception
    Elbarougy, Reda
    Akagi, Masato
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [45] Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training
    Wang, Hao
    Zhou, Lekai
    Duan, Jianyong
    He, Li
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [46] Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions
    Guo, Taiyang
    Li, Sixia
    Kidani, Shunsuke
    Okada, Shogo
    Unoki, Masashi
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2221 - 2227
  • [47] Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 1170 - 1182
  • [48] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
    Cho, Sunyoung
    Yoon, Soosung
    Song, Hyunseung
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970
  • [49] Towards Cross-Lingual Emotion Transplantation
    Lorenzo-Trueba, Jaime
    Barra-Chicote, Roberto
    Yamagishi, Junichi
    Montero, Juan M.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 199 - 208
  • [50] Towards cross-lingual emotion transplantation
    [J]. 1600, Springer Verlag (8854):