Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0

被引：2

作者：

Gupta, Shivang ^{[1
]}

Motepalli, Kowshik Siva Sai ^{[1
]}

Kumar, Ravi ^{[1
]}

Narasinga, Vamsi ^{[1
]}

Mirishkar, Sai Ganesh ^{[1
]}

Vuppala, Anil Kumar ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Hyderabad, India

来源：

SPEECH AND COMPUTER, SPECOM 2023, PT II | 2023年 / 14339卷

关键词：

Language identification; Wav2vec2.0; Self-attention mechanism; Equal error rate;

D O I：

10.1007/978-3-031-48312-7_40

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work proposes the utilization of a self-supervised pre-trained network for developing a Language Identification (LID) system catering to low-resource Indian languages. The framework employed is Wav2vec2.0-XLSR-53, pre-trained on 53k hours of unlabeled speech data. The unsupervised training of the model enables it to learn the acoustic patterns specific to a language. Given that languages share phonetic space, multi-lingual pre-training is instrumental in learning cross-lingual information and building systems that cater to low-resource languages. Further fine-tuning with a limited amount of labeled data significantly boosts the model's accuracy. The results showcase a relative improvement of 33.2% over the DNN-A (DNN with attention) model and 19.04% over Dense Resnets for the Language Identification task on the IIITH-ILSC database using the proposed features (Shivang Gupta and Kowshik Siva Sai Motepalli share first authorship).

引用

页码：503 / 512

页数：10

共 25 条

[1] Enhancing Stuttering Detection and Classification using Wav2Vec2.0
Sen, Madhurima
Das, Pradip K.
2024 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP, 2024,
[2] Keyword spotting for dialectal speech and Introduction of wav2vec2.0
Ariga, Tomohiro
Minakawa, Reo
Kojima, Kazunori
Lee, Shi-Wook
Itoh, Yoshiaki
APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
[3] Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper
Kozhirbayev, Zhanibek
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (06) : 1382 - 1389
[4] Exploring wav2vec 2.0 on speaker verification and language identification
Fan, Zhiyun
Li, Meng
Zhou, Shiyu
Xu, Bo
INTERSPEECH 2021, 2021, : 1509 - 1513
[5] The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Ge, Zirui
Guo, Haiyan
Wang, Tingting
Yang, Zhen
arXiv, 2023,
[6] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
Yi, Cheng
Wang, Jianzong
Cheng, Ning
Zhou, Shiyu
Xu, Bo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[7] Damage localization method using ultrasonic lamb waves and Wav2Vec2.0 neural network
Qian, Lubin
Liu, Sihao
Fan, Guopeng
Liu, Xinlong
Zhang, Hui
Mei, Yaohua
Xing, Yuhui
Wang, Zhiqiang
FRONTIERS IN MATERIALS, 2023, 10
[8] A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis
Peng, Linkai
Fu, Kaiqi
Lin, Binghuai
Ke, Dengfeng
Zhan, Jinsong
INTERSPEECH 2021, 2021, : 4448 - 4452
[9] Automatic Speech Disfluency Detection Using wav2vec2.0 for Different Languages with Variable Lengths
Liu, Jiajun
Wumaier, Aishan
Wei, Dongping
Guo, Shen
APPLIED SCIENCES-BASEL, 2023, 13 (13):
[10] Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features
Shahin, Mostafa
Nan, Zheng
Sethu, Vidhyasaharan
Ahmed, Beena
INTERSPEECH 2023, 2023, : 4119 - 4123

← 1 2 3 →