Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0

被引:2
|
作者
Gupta, Shivang [1 ]
Motepalli, Kowshik Siva Sai [1 ]
Kumar, Ravi [1 ]
Narasinga, Vamsi [1 ]
Mirishkar, Sai Ganesh [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Hyderabad, India
来源
关键词
Language identification; Wav2vec2.0; Self-attention mechanism; Equal error rate;
D O I
10.1007/978-3-031-48312-7_40
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work proposes the utilization of a self-supervised pre-trained network for developing a Language Identification (LID) system catering to low-resource Indian languages. The framework employed is Wav2vec2.0-XLSR-53, pre-trained on 53k hours of unlabeled speech data. The unsupervised training of the model enables it to learn the acoustic patterns specific to a language. Given that languages share phonetic space, multi-lingual pre-training is instrumental in learning cross-lingual information and building systems that cater to low-resource languages. Further fine-tuning with a limited amount of labeled data significantly boosts the model's accuracy. The results showcase a relative improvement of 33.2% over the DNN-A (DNN with attention) model and 19.04% over Dense Resnets for the Language Identification task on the IIITH-ILSC database using the proposed features (Shivang Gupta and Kowshik Siva Sai Motepalli share first authorship).
引用
收藏
页码:503 / 512
页数:10
相关论文
共 25 条
  • [1] Enhancing Stuttering Detection and Classification using Wav2Vec2.0
    Sen, Madhurima
    Das, Pradip K.
    2024 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP, 2024,
  • [2] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
    Ariga, Tomohiro
    Minakawa, Reo
    Kojima, Kazunori
    Lee, Shi-Wook
    Itoh, Yoshiaki
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [3] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
    Kozhirbayev, Zhanibek
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
  • [4] Exploring wav2vec 2.0 on speaker verification and language identification
    Fan, Zhiyun
    Li, Meng
    Zhou, Shiyu
    Xu, Bo
    INTERSPEECH 2021, 2021, : 1509 - 1513
  • [5] The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
    Ge, Zirui
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    arXiv, 2023,
  • [6] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [7] Damage localization method using ultrasonic lamb waves and Wav2Vec2.0 neural network
    Qian, Lubin
    Liu, Sihao
    Fan, Guopeng
    Liu, Xinlong
    Zhang, Hui
    Mei, Yaohua
    Xing, Yuhui
    Wang, Zhiqiang
    FRONTIERS IN MATERIALS, 2023, 10
  • [8] A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis
    Peng, Linkai
    Fu, Kaiqi
    Lin, Binghuai
    Ke, Dengfeng
    Zhan, Jinsong
    INTERSPEECH 2021, 2021, : 4448 - 4452
  • [9] Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
    Liu, Jiajun
    Wumaier, Aishan
    Wei, Dongping
    Guo, Shen
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [10] Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features
    Shahin, Mostafa
    Nan, Zheng
    Sethu, Vidhyasaharan
    Ahmed, Beena
    INTERSPEECH 2023, 2023, : 4119 - 4123