Common latent representation learning for low-resourced spoken language identification

被引:0
|
作者
Chen, Chen [1 ,2 ]
Bu, Yulin [1 ]
Chen, Yong [1 ]
Chen, Deyun [1 ,2 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
[2] Harbin Univ Sci & Technol, Postdoctoral Res Stn Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
基金
黑龙江省自然科学基金; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Spoken language identification; Total variability space; I-vector; Common latent representation learning; RECOGNITION; SPEECH;
D O I
10.1007/s11042-023-16865-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.
引用
收藏
页码:34515 / 34535
页数:21
相关论文
共 50 条
  • [31] BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language
    Ashraf, Muhammad Rehan
    Jana, Yasmeen
    Umer, Qasim
    Jaffar, M. Arfan
    Chung, Sungwook
    Ramay, Waheed Yousuf
    IEEE ACCESS, 2023, 11 : 110245 - 110259
  • [32] LEARNING TO TRANSLATE LOW-RESOURCED SWISS GERMAN DIALECTAL SPEECH INTO STANDARD GERMAN TEXT
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 817 - 823
  • [33] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Obiang, Saint germes b. bengono
    Tsopze, Norbert
    Yonta, Paulin melatagia
    Bonastre, Jean-francois
    Jimenez, Tania
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)
  • [34] The Best of both Worlds: Dual Channel Language modeling for Hope Speech Detection in low-resourced Kannada
    Hande, Adeep
    Hegde, Siddhanth U.
    Sangeetha, Sivanesan
    Priyadharshini, Ruba
    Chakravarthi, Bharathi Raja
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 127 - 135
  • [35] Spoken Language Identification Using Deep Learning
    Singh, Gundeep
    Sharma, Sahil
    Kumar, Vijay
    Kaur, Manjit
    Baz, Mohammed
    Masud, Mehedi
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [36] Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification
    Shen, Peng
    Lu, Xugang
    Li, Sheng
    Kawai, Hisashi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 2674 - 2683
  • [37] Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
    Agbesi, Victor Kwaku
    Chen, Wenyu
    Yussif, Sophyani Banaamwini
    Hossin, Md Altab
    Ukwuoma, Chiagoziem C.
    Kuadey, Noble A.
    Agbesi, Colin Collinson
    Samee, Nagwan Abdel
    Jamjoom, Mona M.
    Al-antari, Mugahed A.
    SYSTEMS, 2024, 12 (01):
  • [38] LOW-RESOURCED PHONETIC AND PROSODIC FEATURE ESTIMATION WITH SELF-SUPERVISED-LEARNING-BASED ACOUSTIC MODELING
    Kurihara, Kiyoshi
    Sano, Masanori
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 640 - 644
  • [39] A review into deep learning techniques for spoken language identification
    Irshad Ahmad Thukroo
    Rumaan Bashir
    Kaiser J. Giri
    Multimedia Tools and Applications, 2022, 81 : 32593 - 32624
  • [40] A review into deep learning techniques for spoken language identification
    Thukroo, Irshad Ahmad
    Bashir, Rumaan
    Giri, Kaiser J.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32593 - 32624