Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions

被引:0
|
作者
Guo, Taiyang [1 ]
Li, Sixia [1 ]
Kidani, Shunsuke [1 ]
Okada, Shogo [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
D O I
10.1109/APSIPAASC58517.2023.10317449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handling multiple languages under noisy reverberant conditions has become increasingly important for speech emotion recognition (SER). Previous studies found that modulation spectral features (MSFs) are robust to noisy reverberant conditions for SER. However, they mainly focused on specific languages; the universality of MSFs among languages is still unclear. To address this issue, we compared MSFs, hand-crafted features, Wav2Vec2.0-based features, MSFs+hand-crafted features for SER on four languages under 12 noisy reverberant conditions. Intra-lingual results showed that MSFs+hand-crafted features performed best on most conditions of all languages. Inter-lingual results showed that MSFs performed best on most conditions of test languages except training on a tonal language and testing on others. The results demonstrate that MSFs are robust to multilingual SER under noisy reverberant conditions and suggest that MSFs are potentially language-independent features for nontonal languages.
引用
收藏
页码:2221 / 2227
页数:7
相关论文
共 50 条
  • [21] Automatic speech emotion recognition using modulation spectral features
    Wu, Siqing
    Falk, Tiago H.
    Chan, Wai-Yip
    SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
  • [22] Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions
    Suman, Maloji
    Khan, Habibulla
    Latha, M. Madhavi
    Kumari, Devarakonda Aruna
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 379 - +
  • [23] CLIoS: Cross-lingual Induction of Speech Recognition Grammars
    Perera, Nadine
    Pitz, Michael
    Pinkal, Manfred
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2487 - 2494
  • [24] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    INTERSPEECH 2021, 2021, : 2426 - 2430
  • [25] Enhancement of speech intelligibility under noisy reverberant conditions based on modulation spectrum concept
    Van Ngo, Thuan
    Ho, Tuan Vu
    Unoki, Masashi
    Kubo, Rieko
    Akagi, Masato
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 753 - 758
  • [26] Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks
    Sultana, Sadia
    Iqbal, M. Zafar
    Selim, M. Reza
    Rashid, Md. Mijanur
    Rahman, M. Shahidur
    IEEE ACCESS, 2022, 10 : 564 - 578
  • [27] Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios
    Kawade, Rupali Ramdas
    Jagtap, Sonal K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 54331 - 54365
  • [28] Optimal trained ensemble of classification model for speech emotion recognition: Considering cross-lingual and multi-lingual scenarios
    Rupali Ramdas Kawade
    Sonal K. Jagtap
    Multimedia Tools and Applications, 2024, 83 : 54331 - 54365
  • [29] Adverse Conditions and Techniques for Cross-Lingual Text Recognition
    Kaur, Achint
    Shrawankar, Urmila
    2017 INTERNATIONAL CONFERENCE ON INNOVATIVE MECHANISMS FOR INDUSTRY APPLICATIONS (ICIMIA), 2017, : 70 - 74
  • [30] Cross-Lingual Acoustic modeling for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 873 - +