ProsAudit, a prosodic benchmark for self-supervised speech models

被引:0
|
作者
de Seyssel, Maureen [1 ,2 ]
Lavechin, Marvin [1 ,6 ]
Titeux, Hadrien [1 ]
Thomas, Arthur [7 ]
Virlet, Gwendal [5 ,7 ]
Revilla, Andrea Santos [7 ]
Wisniewski, Guillaume [2 ]
Ludusan, Bogdan [3 ,4 ]
Dupoux, Emmanuel [1 ,6 ]
机构
[1] PSL Res Univ, Cognit Machine Learning, EHESS, ENS,CNRS,INRIA, Paris, France
[2] Univ Paris Cite, CNRS, Lab Linguist Formelle, Paris, France
[3] Bielefeld Univ, Fac Linguist & Literary Studies, Bielefeld, Germany
[4] Bielefeld Univ, CITEC, Bielefeld, Germany
[5] INRAE, Inst Agro, PEGASE, St Gilles, France
[6] Meta AI Res, Paris, France
[7] CoML, Paris, France
来源
关键词
prosody; speech representation; self-supervised learning; human evaluation; SPOKEN LANGUAGE;
D O I
10.21437/Interspeech.2023-438
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present ProsAudit, a benchmark in English to assess structural prosodic knowledge in self-supervised learning (SSL) speech models. It consists of two subtasks, their corresponding metrics, and an evaluation dataset. In the protosyntax task, the model must correctly identify strong versus weak prosodic boundaries. In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words. We also provide human evaluation scores on this benchmark. We evaluated a series of SSL models and found that they were all able to perform above chance on both tasks, even when evaluated on an unseen language. However, non-native models performed significantly worse than native ones on the lexical task, highlighting the importance of lexical knowledge in this task. We also found a clear effect of size with models trained on more data performing better in the two subtasks.
引用
收藏
页码:2963 / 2967
页数:5
相关论文
共 50 条
  • [21] Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
    Shah, Neil
    Karande, Shirish
    Gandhi, Vineet
    INTERSPEECH 2024, 2024, : 2470 - 2474
  • [22] ANALYSIS OF SELF-SUPERVISED SPEECH MODELS ON CHILDREN'S SPEECH AND INFANT VOCALIZATIONS<bold> </bold>
    Li, Jialu
    Hasegawa-Johnson, Mark
    McElwain, Nancy L.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 550 - 554
  • [23] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
    Yang, Hejung
    Kang, Hong-Goo
    INTERSPEECH 2023, 2023, : 814 - 818
  • [24] SpeechGLUE: HowWell Can Self-Supervised Speech Models Capture Linguistic Knowledge?
    Ashihara, Takanori
    Moriya, Takafumi
    Matsuura, Kohei
    Tanaka, Tomohiro
    Ijima, Yusuke
    Asami, Taichi
    Delcroix, Marc
    Honma, Yukinori
    INTERSPEECH 2023, 2023, : 2888 - 2892
  • [25] CHAPTER: EXPLOITING CONVOLUTIONAL NEURAL NETWORK ADAPTERS FOR SELF-SUPERVISED SPEECH MODELS
    Chen, Zih-Ching
    Sung, Yu-Shun
    Lee, Hung-yi
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [26] Using Large Self-Supervised Models for Low-Resource Speech Recognition
    Krishna, D. N.
    Wang, Pinyi
    Bozza, Bruno
    INTERSPEECH 2021, 2021, : 2436 - 2440
  • [27] Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
    Arunkumar, A.
    Sukhadia, Vrunda Nileshkumar
    Umesh, Srinivasan
    INTERSPEECH 2022, 2022, : 5145 - 5149
  • [28] A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
    Boito, Marcely Zanon
    Besacier, Laurent
    Tomashenko, Natalia
    Esteve, Yannick
    INTERSPEECH 2022, 2022, : 1278 - 1282
  • [29] Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech
    Vaidya, Aditya R.
    Jain, Shailee
    Huth, Alexander G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Learnable Layer Selection and Model Fusion for Speech Self-Supervised Learning Models
    Chiu, Sheng-Chieh
    Wu, Chia-Hua
    Hsieh, Jih-Kang
    Tsao, Yu
    Wang, Hsin-Min
    INTERSPEECH 2024, 2024, : 3914 - 3918