Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0

被引：2

作者：

Gupta, Shivang ^{[1
]}

Motepalli, Kowshik Siva Sai ^{[1
]}

Kumar, Ravi ^{[1
]}

Narasinga, Vamsi ^{[1
]}

Mirishkar, Sai Ganesh ^{[1
]}

Vuppala, Anil Kumar ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Hyderabad, India

来源：

SPEECH AND COMPUTER, SPECOM 2023, PT II | 2023年 / 14339卷

关键词：

Language identification; Wav2vec2.0; Self-attention mechanism; Equal error rate;

D O I：

10.1007/978-3-031-48312-7_40

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This work proposes the utilization of a self-supervised pre-trained network for developing a Language Identification (LID) system catering to low-resource Indian languages. The framework employed is Wav2vec2.0-XLSR-53, pre-trained on 53k hours of unlabeled speech data. The unsupervised training of the model enables it to learn the acoustic patterns specific to a language. Given that languages share phonetic space, multi-lingual pre-training is instrumental in learning cross-lingual information and building systems that cater to low-resource languages. Further fine-tuning with a limited amount of labeled data significantly boosts the model's accuracy. The results showcase a relative improvement of 33.2% over the DNN-A (DNN with attention) model and 19.04% over Dense Resnets for the Language Identification task on the IIITH-ILSC database using the proposed features (Shivang Gupta and Kowshik Siva Sai Motepalli share first authorship).

引用

页码：503 / 512

页数：10

共 25 条

[21] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
Obiang, Saint germes b. bengono
Tsopze, Norbert
Yonta, Paulin melatagia
Bonastre, Jean-francois
Jimenez, Tania
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
[22] Audio Features from the Wav2Vec 2.0 Embeddings for the ACM Multimedia 2022 Stuttering Challenge
Montacie, Claude
Caraty, Marie-Jose
Lackovic, Nikola
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7195 - 7199
[23] 基于Wav2vec2.0神经网络的轨道交通钢轨损伤压电阵列超声导波定位方法
刘思昊
钱鲁斌
梅曜华
邢宇辉
城市轨道交通研究, 2023, 26 (06) : 101 - 105+110
[24] Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques
Lee, Mun-Hak
Lee, Jae-Hong
Kim, DoHee
Kol, Ye-Eun
Chang, Joon-Hyuk
INTERSPEECH 2024, 2024, : 5058 - 5062
[25] Identification of Hate Speech and Abusive Language on Indonesian Twitter Using theWord2vec, Part of Speech and Emoji Features
Ibrohim, Muhammad Okky
Setiadi, Muhammad Akbar
Budi, Indra
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,

← 1 2 3 →