Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features

被引:1
|
作者
Shahin, Mostafa [1 ]
Nan, Zheng [1 ]
Sethu, Vidhyasaharan [1 ]
Ahmed, Beena [1 ]
机构
[1] UNSW, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
来源
关键词
language identification; speech attributes; wav2vec2; code-switching;
D O I
10.21437/Interspeech.2023-2533
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language identification (SLI) is a key component in speech-processing tools such as spoken language understanding. In code-switching conversational speech, speakers change languages for short durations posing an additional challenge to language identification techniques. In this work, we investigate the ability of a wav2vec2-based SLI method in identifying the spoken language of English/Mandarin code-switching child-directed conversational speech recorded via Zoom. The proposed system allows the pre-trained wav2vec2-based model to learn language-dependent phonological features by fine-tuning first on detecting manners and places of articulation, then on classifying between English and Mandarin speech segments. The proposed system was tested against parent-child Zoom recordings provided as a part of the MERLIon CCS challenge of language identification. The system achieved the best balanced accuracy of 81.3% and the second-lowest equal error rate of 10.6%.
引用
收藏
页码:4119 / 4123
页数:5
相关论文
共 50 条
  • [31] Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords
    Shivakumar, Prashanth Gurunath
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    Shahamiri, Seyed Reza
    PLOS ONE, 2022, 17 (03):
  • [32] Spoken Word2Vec: Learning Skipgram Embeddings from Speech
    Sayeed, Mohammad Amaan
    Aldarmaki, Hanan
    INTERSPEECH 2024, 2024, : 2920 - 2924
  • [33] THE VICOMTECH AUDIO DEEPFAKE DETECTION SYSTEM BASED ON WAV2VEC2 FOR THE 2022 ADD CHALLENGE
    Martin-Donas, Juan M.
    Alvarez, Aitor
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9241 - 9245
  • [34] Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
    Montacie, Claude
    Caraty, Marie-Jose
    Lackovic, Nikola
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7195 - 7199
  • [35] Automatic spoken language identification using MFCC based time series features
    Mainak Biswas
    Saif Rahaman
    Ali Ahmadian
    Kamalularifin Subari
    Pawan Kumar Singh
    Multimedia Tools and Applications, 2023, 82 : 9565 - 9595
  • [36] Automatic spoken language identification using MFCC based time series features
    Biswas, Mainak
    Rahaman, Saif
    Ahmadian, Ali
    Subari, Kamalularifin
    Singh, Pawan Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9565 - 9595
  • [37] Improves Neural Acoustic Word Embeddings Query by Example Spoken Term Detection with Wav2vec Pretraining and Circle Loss
    Li, Zhaoqi
    Wu, Long
    Li, Ta
    Yan, Yonghong
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [38] Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation
    Fukuda, Ryo
    Sudoh, Katsuhito
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 906 - 916
  • [39] Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques
    Lee, Mun-Hak
    Lee, Jae-Hong
    Kim, DoHee
    Kol, Ye-Eun
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 5058 - 5062
  • [40] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
    Lee, Jae-Hong
    Lee, Chae-Won
    Choi, Jin-Seong
    Chang, Joon-Hyuk
    Seong, Woo Kyeong
    Lee, Jeonghan
    INTERSPEECH 2022, 2022, : 3398 - 3402