Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features

被引:1
|
作者
Shahin, Mostafa [1 ]
Nan, Zheng [1 ]
Sethu, Vidhyasaharan [1 ]
Ahmed, Beena [1 ]
机构
[1] UNSW, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
来源
关键词
language identification; speech attributes; wav2vec2; code-switching;
D O I
10.21437/Interspeech.2023-2533
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language identification (SLI) is a key component in speech-processing tools such as spoken language understanding. In code-switching conversational speech, speakers change languages for short durations posing an additional challenge to language identification techniques. In this work, we investigate the ability of a wav2vec2-based SLI method in identifying the spoken language of English/Mandarin code-switching child-directed conversational speech recorded via Zoom. The proposed system allows the pre-trained wav2vec2-based model to learn language-dependent phonological features by fine-tuning first on detecting manners and places of articulation, then on classifying between English and Mandarin speech segments. The proposed system was tested against parent-child Zoom recordings provided as a part of the MERLIon CCS challenge of language identification. The system achieved the best balanced accuracy of 81.3% and the second-lowest equal error rate of 10.6%.
引用
收藏
页码:4119 / 4123
页数:5
相关论文
共 50 条
  • [21] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Obiang, Saint germes b. bengono
    Tsopze, Norbert
    Yonta, Paulin melatagia
    Bonastre, Jean-francois
    Jimenez, Tania
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [22] Speech recognition model design for Sundanese language using WAV2VEC 2.0
    Cryssiover A.
    Zahra A.
    International Journal of Speech Technology, 2024, 27 (01) : 171 - 177
  • [23] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
    Zhang, Xu
    Zhang, Xiangcheng
    Chen, Weisi
    Li, Chenlong
    Yu, Chengyuan
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [24] On the robustness of wav2vec 2.0 based speaker recognition systems
    Novoselov, Sergey
    Lavrentyeva, Galina
    Avdeeva, Anastasia
    Volokhov, Vladimir
    Khmelev, Nikita
    Akulov, Artem
    Leonteva, Polina
    INTERSPEECH 2023, 2023, : 3177 - 3181
  • [25] Harnessing the power of Wav2Vec2 and CNNs for Robust Speaker Identification on the VoxCeleb and LibriSpeech Datasets
    Anidjar, Or Haim
    Marbel, Revital
    Yozevitch, Roi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [26] Multi-level Fusion of Fisher Vector Encoded BERT and Wav2vec 2.0 Embeddings for Native Language Identification
    Krebbers, Dani
    Kaya, Heysem
    Karpov, Alexey
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 391 - 403
  • [27] Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi -Modal Speech Representation
    Zhu, Qiushi
    Zhang, Jie
    Gu, Yu
    Hu, Yuchen
    Dai, Lirong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19768 - 19776
  • [28] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
    Baevski, Alexei
    Zhou, Henry
    Mohamed, Abdelrahman
    Auli, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [29] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
    Sadhu, Samik
    He, Di
    Huang, Che-Wei
    Mallidi, Sri Harish
    Wu, Minhua
    Rastrow, Ariya
    Stolcke, Andreas
    Droppo, Jasha
    Maas, Roland
    INTERSPEECH 2021, 2021, : 711 - 715
  • [30] Comparative Study on Spoken Language Identification Based on Deep Learning
    Heracleous, Panikos
    Takai, Kohichi
    Yasuda, Keiji
    Mohammad, Yasser
    Yoneyama, Akio
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2265 - 2269