Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features

被引:1
|
作者
Shahin, Mostafa [1 ]
Nan, Zheng [1 ]
Sethu, Vidhyasaharan [1 ]
Ahmed, Beena [1 ]
机构
[1] UNSW, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
来源
关键词
language identification; speech attributes; wav2vec2; code-switching;
D O I
10.21437/Interspeech.2023-2533
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language identification (SLI) is a key component in speech-processing tools such as spoken language understanding. In code-switching conversational speech, speakers change languages for short durations posing an additional challenge to language identification techniques. In this work, we investigate the ability of a wav2vec2-based SLI method in identifying the spoken language of English/Mandarin code-switching child-directed conversational speech recorded via Zoom. The proposed system allows the pre-trained wav2vec2-based model to learn language-dependent phonological features by fine-tuning first on detecting manners and places of articulation, then on classifying between English and Mandarin speech segments. The proposed system was tested against parent-child Zoom recordings provided as a part of the MERLIon CCS challenge of language identification. The system achieved the best balanced accuracy of 81.3% and the second-lowest equal error rate of 10.6%.
引用
收藏
页码:4119 / 4123
页数:5
相关论文
共 50 条
  • [41] The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
    Ge, Zirui
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    arXiv, 2023,
  • [42] Speech Emotion Recognition Based on Shallow Structure of Wav2vec 2.0 and Attention Mechanism
    Zhang, Yumei
    Jia, Maoshen
    Cao, Xuan
    Zhao, Zichen
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 398 - 402
  • [43] DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis
    van Stein, Bas
    Long, Fu Xing
    Frenzel, Moritz
    Krause, Peter
    Gitterle, Markus
    Back, Thomas
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 515 - 518
  • [44] Dataset2Vec: learning dataset meta-features
    Jomaa, Hadi S.
    Schmidt-Thieme, Lars
    Grabocka, Josif
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (03) : 964 - 985
  • [45] PHONOLOGICAL ASYMMETRY IN 2ND-LANGUAGE LEARNING AND PERFORMANCE
    NEUFELD, GG
    LANGUAGE LEARNING, 1988, 38 (04) : 531 - 559
  • [46] Dataset2Vec: learning dataset meta-features
    Hadi S. Jomaa
    Lars Schmidt-Thieme
    Josif Grabocka
    Data Mining and Knowledge Discovery, 2021, 35 : 964 - 985
  • [47] A lazy learning-based language identification from speech using MFCC-2 features
    Himadri Mukherjee
    Sk Md Obaidullah
    K. C. Santosh
    Santanu Phadikar
    Kaushik Roy
    International Journal of Machine Learning and Cybernetics, 2020, 11 : 1 - 14
  • [48] A lazy learning-based language identification from speech using MFCC-2 features
    Mukherjee, Himadri
    Obaidullah, Sk Md
    Santosh, K. C.
    Phadikar, Santanu
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (01) : 1 - 14
  • [49] A LOG-LINEAR WEIGHTING APPROACH IN THE WORD2VEC SPACE FOR SPOKEN LANGUAGE UNDERSTANDING
    Killian, Janod
    Morchid, Mohamed
    Dufour, Richard
    Linares, Georges
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 356 - 361
  • [50] Wav2vec2 Without Attention: Do You Need Hopfield Networks for Self-Supervised Learning of Speech Representations?
    D. Grebenkin
    I. Bondarenko
    Journal of Mathematical Sciences, 2024, 285 (1) : 28 - 35