Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features

被引:1
|
作者
Shahin, Mostafa [1 ]
Nan, Zheng [1 ]
Sethu, Vidhyasaharan [1 ]
Ahmed, Beena [1 ]
机构
[1] UNSW, Sch Elect Engn & Telecommun, Sydney, NSW, Australia
来源
关键词
language identification; speech attributes; wav2vec2; code-switching;
D O I
10.21437/Interspeech.2023-2533
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language identification (SLI) is a key component in speech-processing tools such as spoken language understanding. In code-switching conversational speech, speakers change languages for short durations posing an additional challenge to language identification techniques. In this work, we investigate the ability of a wav2vec2-based SLI method in identifying the spoken language of English/Mandarin code-switching child-directed conversational speech recorded via Zoom. The proposed system allows the pre-trained wav2vec2-based model to learn language-dependent phonological features by fine-tuning first on detecting manners and places of articulation, then on classifying between English and Mandarin speech segments. The proposed system was tested against parent-child Zoom recordings provided as a part of the MERLIon CCS challenge of language identification. The system achieved the best balanced accuracy of 81.3% and the second-lowest equal error rate of 10.6%.
引用
收藏
页码:4119 / 4123
页数:5
相关论文
共 50 条
  • [1] Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering
    Grosz, Tamas
    Porjazovski, Dejan
    Getman, Yaroslav
    Kadiri, Sudarsana Reddy
    Kurimo, Mikko
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7026 - 7029
  • [2] Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
    Yang, Mu
    Hirschi, Kevin
    Looney, Stephen D.
    Kang, Okim
    Hansen, John H. L.
    INTERSPEECH 2022, 2022, : 4481 - 4485
  • [3] wav2vec2-based Speech Rating System for Children with Speech Sound Disorder
    Getman, Yaroslav
    Al-Ghezil, Ragheb
    Vbskoboinik, Ekaterina
    Grosz, Tamas
    Kurimo, Mikko
    Salvi, Giampiero
    Svendsen, Torbjorn
    Strombergsson, Sofia
    INTERSPEECH 2022, 2022, : 3618 - 3622
  • [4] End to End Spoken Language Diarization with Wav2vec Embeddings
    Mishra, Jagabandhu
    Patil, Jayadev N.
    Chowdhury, Amartya
    Prasanna, S. R. Mahadeva
    INTERSPEECH 2023, 2023, : 501 - 505
  • [5] A WAV2VEC2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition
    Jain, Rishabh
    Barcovschi, Andrei
    Yiwere, Mariam Yahayah
    Bigioi, Dan
    Corcoran, Peter
    Cucu, Horia
    IEEE ACCESS, 2023, 11 : 46938 - 46948
  • [6] What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
    Yang, Mu
    Shekar, Ram C. M. C.
    Kang, Okim
    Hansen, John H. L.
    INTERSPEECH 2023, 2023, : 1923 - 1927
  • [7] Exploring wav2vec 2.0 on speaker verification and language identification
    Fan, Zhiyun
    Li, Meng
    Zhou, Shiyu
    Xu, Bo
    INTERSPEECH 2021, 2021, : 1509 - 1513
  • [8] Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0
    Gupta, Shivang
    Motepalli, Kowshik Siva Sai
    Kumar, Ravi
    Narasinga, Vamsi
    Mirishkar, Sai Ganesh
    Vuppala, Anil Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 503 - 512
  • [9] Exploring Aggregated wav2vec 2.0 Features and Dual-Stream TDNN for Efficient Spoken Dialect Identification
    Angra, Ananya
    Muralikrishna, H.
    Dinesh, Dileep Aroor
    Thenkanidiyoor, Veena
    IEEE ACCESS, 2025, 13 : 3115 - 3129
  • [10] Unsupervised Spoken Term Discovery Using wav2vec 2.0
    Iwamoto, Yu
    Shinozaki, Takahiro
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1082 - 1086