Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions

被引:0
|
作者
Guo, Taiyang [1 ]
Li, Sixia [1 ]
Kidani, Shunsuke [1 ]
Okada, Shogo [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa 9231292, Japan
基金
日本学术振兴会;
关键词
D O I
10.1109/APSIPAASC58517.2023.10317449
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handling multiple languages under noisy reverberant conditions has become increasingly important for speech emotion recognition (SER). Previous studies found that modulation spectral features (MSFs) are robust to noisy reverberant conditions for SER. However, they mainly focused on specific languages; the universality of MSFs among languages is still unclear. To address this issue, we compared MSFs, hand-crafted features, Wav2Vec2.0-based features, MSFs+hand-crafted features for SER on four languages under 12 noisy reverberant conditions. Intra-lingual results showed that MSFs+hand-crafted features performed best on most conditions of all languages. Inter-lingual results showed that MSFs performed best on most conditions of test languages except training on a tonal language and testing on others. The results demonstrate that MSFs are robust to multilingual SER under noisy reverberant conditions and suggest that MSFs are potentially language-independent features for nontonal languages.
引用
收藏
页码:2221 / 2227
页数:7
相关论文
共 50 条
  • [41] Cost-efficient cross-lingual adaptation of a speech recognition system
    Callejas, Zoraida
    Nouza, Jan
    Cerva, Petr
    López-Cózar, Ramón
    Advances in Intelligent and Soft Computing, 2009, 57 : 331 - 338
  • [42] CROSS-LINGUAL PHONEME MAPPING FOR LANGUAGE ROBUST CONTEXTUAL SPEECH RECOGNITION
    Patel, Ami
    Li, David
    Cho, Eunjoon
    Aleksic, Petar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5924 - 5928
  • [43] Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
    Hernandez, Abner
    Perez-Toro, Paula Andrea
    Noeth, Elmar
    Orozco-Arroyave, Juan Rafael
    Maier, Andreas
    Yang, Seung Hee
    INTERSPEECH 2022, 2022, : 51 - 55
  • [44] Modulation frequency features for phoneme recognition in noisy speech
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2009, 125 (01): : EL8 - EL12
  • [45] CROSS-LINGUAL CONTEXT SHARING AND PARAMETER-TYING FOR MULTI-LINGUAL SPEECH RECOGNITION
    Mohan, Aanchan
    Rose, Richard
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 126 - 131
  • [46] Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment
    Odani, Kyohei
    Wang, Longbiao
    Kai, Atsuhiko
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1250 - 1253
  • [47] A many-to-one phone mapping approach for cross-lingual speech recognition
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 120 - 124
  • [48] TOWARDS TRANSFERABLE SPEECH EMOTION REPRESENTATION: ON LOSS FUNCTIONS FOR CROSS-LINGUAL LATENT REPRESENTATIONS
    Das, Sneha
    Lonfeldt, Nicole Nadine
    Pagsberg, Anne Katrine
    Clemmensen, Line H.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6452 - 6456
  • [49] Semantic speech recognition in the Basque context Part I: cross-lingual approaches
    Barroso, Nora
    Lopez de Ipina, Karmele
    Barroso, Odei
    Ezeiza, Aitzol
    Hernandez, Carmen
    Grana, Manuel
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 33 - 40
  • [50] CROSS-LINGUAL SPEECH RECOGNITION BETWEEN LANGUAGES FROM THE SAME LANGUAGE FAMILY
    Zgank, Andrej
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2019, 20 (02): : 184 - 191