Towards Cross-Corpora Generalization for Low-Resource Spoken Language Identification

被引:0
|
作者
Dey, Spandan [1 ,2 ]
Sahidullah, Md [3 ,4 ]
Saha, Goutam [1 ]
机构
[1] Indian Inst Technol Kharagpur, Dept E & ECE, Kharagpur 721302, India
[2] Samsung R&D Inst India Bangalore, Bengaluru 560037, India
[3] TCG CREST, Inst Adv Intelligence, Bidhannagar 700091, India
[4] Acad Sci & Innovat Res, Ghaziabad 201002, India
关键词
Vectors; Training; Correlation; Speech processing; Recording; Noise; Measurement; Databases; Training data; NIST; Spoken language identification; low-resource; cross-corpora evaluation; corpora mismatch; domain invariance;
D O I
10.1109/TASLP.2024.3492807
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Low-resource spoken language identification (LID) systems are prone to poor generalization across unknown domains. In this study, using multiple widely used low-resourced South Asian LID corpora, we conduct an in-depth analysis for understanding the key non-lingual bias factors that create corpora mismatch and degrade LID generalization. To quantify the biases, we extract different data-driven and rule-based summary vectors that capture non-lingual aspects, such as speaker characteristics, spoken context, accents or dialects, recording channels, background noise, and environments. We then conduct a statistical analysis to identify the most crucial non-lingual bias factors and corpora mismatch components that impact LID performance. Following these analyses, we then propose effective bias compensation approaches for the most relevant summary vectors. We generate pseudo-labels using hierarchical clustering over language-domain-gender constrained summary vectors and use them to train adversarial networks with conditioned metric loss. The compensations learn invariance for the corpora mismatches due to the non-lingual biases and help to improve the generalization. With the proposed compensation method, we improve equal error rate up to 5.22% and 8.14% for the same-corpora and cross-corpora evaluations, respectively.
引用
收藏
页码:5040 / 5050
页数:11
相关论文
共 50 条
  • [31] LOW-RESOURCE CONTEXTUAL TOPIC IDENTIFICATION ON SPEECH
    Liu, Chunxi
    Wiesner, Matthew
    Watanabe, Shinji
    Harman, Craig
    Trmal, Jan
    Dehak, Najim
    Khudanpur, Sanjeev
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 656 - 663
  • [32] Capsule Networks for Low Resource Spoken Language Understanding
    Renkens, Vincent
    Van Hamme, Hugo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 601 - 605
  • [33] MULTITASK LEARNING FOR LOW RESOURCE SPOKEN LANGUAGE UNDERSTANDING
    Meeus, Quentin
    Moens, Marie Francine
    Van Hamme, Hugo
    INTERSPEECH 2022, 2022, : 4073 - 4077
  • [34] MC-SLT: Towards Low-Resource Signer-Adaptive Sign Language Translation
    Jin, Tao
    Zhao, Zhou
    Zhang, Meng
    Zeng, Xingshan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4939 - 4947
  • [35] Towards Low-Resource Automatic Program Repair with Meta-Learning and Pretrained Language Models
    Wang, Weishi
    Wang, Yue
    Hoi, Steven C. H.
    Joty, Shafiq
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6954 - 6968
  • [36] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [37] Hybrid Approach Text Generation for Low-Resource Language
    Rakhimova, Diana
    Adali, Esref
    Karibayeva, Aidana
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 256 - 268
  • [38] A Scheme for News Article Classification in a Low-Resource Language
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 519 - 530
  • [39] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [40] Natural language processing applications for low-resource languages
    Pakray, Partha
    Gelbukh, Alexander
    Bandyopadhyay, Sivaji
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 183 - 197