MULTI-LINGUAL SPEECH RECOGNITION WITH LOW-RANK MULTI-TASK DEEP NEURAL NETWORKS

被引:0
|
作者
Mohan, Aanchan [1 ]
Rose, Richard [1 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
关键词
Low-resource speech recognition; Multi-lingual speech recognition; Neural Networks for speech recognition; Multi-task Learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-task learning (MTL) for deep neural network (DNN) multilingual acoustic models has been shown to be effective for learning parameters that are common or shared between multiple languages[1, 2]. In the MTL paradigm, the number of parameters in the output layer is large and scales with the number of languages used in training. This output layer becomes a computational bottleneck. For mono-lingual DNNs, low-rank matrix factorization (LRMF) of weight matrices have yielded large computational savings[3, 4]. The LRMF proposed in this work for MTL, is for the original language-specific block matrices to "share" a common matrix, with resulting low-rank language specific block matrices. The impact of LRMF is presented in two scenarios, namely : (a) improving performance in a target language when auxiliary languages are included during multi-lingual training; and (b) cross-language transfer to an unseen language with only 1 hour of transcribed training data. A 44% parameter reduction in the final layer, manifests itself in providing a lower memory footprint and faster training times. An experimental study shows that the LRMF multi-lingual DNN provides competitive performance compared to a full-rank multi-lingual DNN in both scenarios.
引用
收藏
页码:4994 / 4998
页数:5
相关论文
共 50 条
  • [1] MULTI-LINGUAL DEEP NEURAL NETWORKS FOR LANGUAGE RECOGNITION
    Marcos, Luis Murphy
    Richardson, Frederick
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 330 - 334
  • [2] MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
    Sharma, Mayank
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6907 - 6911
  • [3] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [4] Multi-Task Based Mispronunciation Detection of Children Speech Using Multi-Lingual Information
    Wei, Linxuan
    Dong, Wenwei
    Lin, Binghuai
    Zhang, Jinsong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1791 - 1794
  • [5] A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling
    Lin, Ying
    Yang, Shengqi
    Stoyanov, Veselin
    Ji, Heng
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 799 - 809
  • [6] MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Yin, Maofan
    You, Yongbin
    Yu, Kai
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 310 - 316
  • [7] Multi-Lingual Unsupervised Acoustic Modeling Using Multi-Task Deep Neural Network under Mismatch Conditions
    Yao Haitao
    Xu Ji
    Liu Jian
    [J]. PROCEEDINGS OF 2016 8TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2016), 2016, : 139 - 144
  • [8] Multi-lingual character recognition using Artificial Neural Networks
    Meiyappan, SS
    Sridharan, S
    Ososanya, ET
    [J]. PROCEEDINGS OF THE IEEE SOUTHEASTCON '96: BRINGING TOGETHER EDUCATION, SCIENCE AND TECHNOLOGY, 1996, : 417 - 420
  • [9] A multi-lingual speech recognition system using a neural network approach
    Chen, OTC
    Chen, CY
    Chang, HT
    Hsu, FR
    Yang, HL
    Lee, YG
    [J]. ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1576 - 1581
  • [10] Multi-task Learning Deep Neural Networks For Speech Feature Denoising
    Huang, Bin
    Ke, Dengfeng
    Zheng, Hao
    Xu, Bo
    Xu, Yanyan
    Su, Kaile
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2464 - 2468