Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions

被引:0
|
作者
Guo, Taiyang [1 ]
Li, Sixia [1 ]
Kidani, Shunsuke [1 ]
Okada, Shogo [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
D O I
10.1109/APSIPAASC58517.2023.10317449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handling multiple languages under noisy reverberant conditions has become increasingly important for speech emotion recognition (SER). Previous studies found that modulation spectral features (MSFs) are robust to noisy reverberant conditions for SER. However, they mainly focused on specific languages; the universality of MSFs among languages is still unclear. To address this issue, we compared MSFs, hand-crafted features, Wav2Vec2.0-based features, MSFs+hand-crafted features for SER on four languages under 12 noisy reverberant conditions. Intra-lingual results showed that MSFs+hand-crafted features performed best on most conditions of all languages. Inter-lingual results showed that MSFs performed best on most conditions of test languages except training on a tonal language and testing on others. The results demonstrate that MSFs are robust to multilingual SER under noisy reverberant conditions and suggest that MSFs are potentially language-independent features for nontonal languages.
引用
下载
收藏
页码:2221 / 2227
页数:7
相关论文
共 50 条
  • [1] Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments
    Guo, Taiyang
    Zhu, Zhi
    Kidani, Shunsuke
    Unoki, Masashi
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [2] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [3] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773
  • [4] Cross-lingual Speech Emotion Recognition through Factor Analysis
    Desplanques, Brecht
    Demuynck, Kris
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
  • [5] Semi-supervised cross-lingual speech emotion recognition
    Agarla, Mirko
    Bianco, Simone
    Celona, Luigi
    Napoletano, Paolo
    Petrovsky, Alexey
    Piccoli, Flavio
    Schettini, Raimondo
    Shanin, Ivan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [6] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [7] Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features
    Zhan, Qingran
    Motlicek, Petr
    Du, Shixuan
    Shan, Yahui
    Ma, Sifan
    Xie, Xiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1912 - 1916
  • [8] Cross-Lingual Automatic Speech Recognition Using Tandem Features
    Lal, Partha
    King, Simon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2506 - 2515
  • [9] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
    Li, Fin
    Yan, Nan
    Wang, Lan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
  • [10] Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset
    Atmaja, Bagus Tris
    Sasou, Akira
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1019 - 1025