Common latent representation learning for low-resourced spoken language identification

被引:0
|
作者
Chen, Chen [1 ,2 ]
Bu, Yulin [1 ]
Chen, Yong [1 ]
Chen, Deyun [1 ,2 ]
机构
[1] Harbin Univ Sci & Technol, Sch Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
[2] Harbin Univ Sci & Technol, Postdoctoral Res Stn Comp Sci & Technol, Harbin 150080, Heilongjiang, Peoples R China
基金
黑龙江省自然科学基金; 中国博士后科学基金; 中国国家自然科学基金;
关键词
Spoken language identification; Total variability space; I-vector; Common latent representation learning; RECOGNITION; SPEECH;
D O I
10.1007/s11042-023-16865-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The i-vector method is one of the mainstream methods in spoken language identification (SLID). It estimates the total variability space (TVS) to obtain a low-rank representation which can characterize the language, called the i-vector. However, on small-scale datasets, low learning resources can significantly degrade the performance of SLID system. Therefore, it is necessary to improve the performance of SLID system in low-resourced condition. In this paper, we propose a common latent representation learning (CLRL) method to learn the TVS, which introduces prior information to address the lack of information in low-resourced condition. The prior information includes category label and parameter prior hypothesis. The CLRL method is evaluated on the OLR2020 dataset. Compared with other state-of-the-art methods, the CLRL method shows better performance on all datasets of different data scales. Moreover, the CLRL method can effectively improve the performance of the SLID system on low-resourced/small-scale datasets.
引用
收藏
页码:34515 / 34535
页数:21
相关论文
共 50 条
  • [41] Identification of Spoken Language using Machine Learning Approach
    Shahriar, Md Asif
    Aziz, Iftekhar
    Banik, Shovan
    Sattar, Abdus
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [42] Deep Learning Transformer Architecture for Named-Entity Recognition on Low-Resourced Languages: State of the art results
    Hanslo, Ridewaan
    PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, : 53 - 60
  • [43] The Bigger Fish: A Comparison of Meta-Learning QSAR Models on Low-Resourced Aquatic Toxicity Regression Tasks
    Schlender, Thalea
    Viljanen, Markus
    van Rijn, Jan N.
    Mohr, Felix
    Peijnenburg, Willie J. G. M.
    Hoos, Holger H.
    Rorije, Emiel
    Wong, Albert
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2023, 57 (46) : 17818 - 17830
  • [44] MULTITASK LEARNING FOR LOW RESOURCE SPOKEN LANGUAGE UNDERSTANDING
    Meeus, Quentin
    Moens, Marie Francine
    Van Hamme, Hugo
    INTERSPEECH 2022, 2022, : 4073 - 4077
  • [45] COUPLED REPRESENTATION LEARNING FOR DOMAINS, INTENTS AND SLOTS IN SPOKEN LANGUAGE UNDERSTANDING
    Lee, Jihwan
    Kim, Dongchan
    Sarikaya, Ruhi
    Kim, Young-Bum
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 714 - 719
  • [46] Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition
    Lee, Hung-Shin
    Tsao, Yu
    Jeng, Shyh-Kang
    Wang, Hsin-Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 3065 - 3079
  • [47] Robust Latent Common Subspace Learning for Transferable Feature Representation
    Zhan, Shanhua
    Sun, Weijun
    Kang, Peipei
    ELECTRONICS, 2022, 11 (05)
  • [48] Comparative Study on Spoken Language Identification Based on Deep Learning
    Heracleous, Panikos
    Takai, Kohichi
    Yasuda, Keiji
    Mohammad, Yasser
    Yoneyama, Akio
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2265 - 2269
  • [49] Deep temporal representation learning for language identification
    Chen, Chen
    Chen, Yong
    Li, Weiwei
    Chen, Deyun
    NEURAL NETWORKS, 2025, 182
  • [50] Feature Representation of Short Utterances based on Knowledge Distillation for Spoken Language Identification
    Shen, Peng
    Lu, Xugang
    Li, Sheng
    Kawai, Hisashi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1813 - 1817