TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION

被引:0
|
作者
Inaguma, Hirofumi [1 ]
Cho, Jaejin [2 ]
Baskar, Murali Karthick [3 ]
Kawahara, Tatsuya [1 ]
Watanabe, Shinji [2 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
[2] Johns Hopkins Univ, Baltimore, MD USA
[3] Brno Univ Technol, Brno, Czech Republic
关键词
end-to-end ASR; multilingual speech recognition; low-resource language; transfer learning;
D O I
10.1109/icassp.2019.8682918
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work explores better adaptation methods to low-resource languages using an external language model ( LM) under the framework of transfer learning. We first build a language-independent ASR system in a unified sequence-to-sequence ( S2S) architecture with a shared vocabulary among all languages. During adaptation, we perform LM fusion transfer, where an external LM is integrated into the decoder network of the attention-based S2S model in the whole adaptation stage, to effectively incorporate linguistic context of the target language. We also investigate various seed models for transfer learning. Experimental evaluations using the IARPA BABEL data set show that LM fusion transfer improves performances on all target five languages compared with simple transfer learning when the external text data is available. Our final system drastically reduces the performance gap from the hybrid systems.
引用
收藏
页码:6096 / 6100
页数:5
相关论文
共 50 条
  • [1] INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR
    Van Tung Pham
    Xu, Haihua
    Khassanov, Yerbolat
    Zeng, Zhiping
    Chng, Eng Siong
    Ni, Chongjia
    Ma, Bin
    Li, Haizhou
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7059 - 7063
  • [2] Transfer Learning for End-to-End ASR to Deal with Low-Resource Problem in Persian Language
    Kermanshahi, Maryam Asadolahzade
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. 2021 26TH INTERNATIONAL COMPUTER CONFERENCE, COMPUTER SOCIETY OF IRAN (CSICC), 2021,
  • [3] SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
    Bansal, Shubham
    Malhotra, Karan
    Ganapathy, Sriram
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 494 - 501
  • [4] Exploring Transfer Learning For End-to-End Spoken Language Understanding
    Rongali, Subendhu
    Liu, Beiye
    Cai, Liwei
    Arkoudas, Konstantine
    Su, Chengwei
    Hamza, Wael
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13754 - 13761
  • [5] COMPONENT FUSION: LEARNING REPLACEABLE LANGUAGE MODEL COMPONENT FOR END-TO-END SPEECH RECOGNITION SYSTEM
    Shan, Changhao
    Weng, Chao
    Wang, Guangsen
    Su, Dan
    Luo, Min
    Yu, Dong
    Xie, Lei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5631 - 5635
  • [6] END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING
    Palogiannidi, Elisavet
    Gkinis, Ioannis
    Mastrapas, George
    Mizera, Petr
    Stafylakis, Themos
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7974 - 7978
  • [7] LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION
    Watanabe, Shinji
    Hori, Takaaki
    Hershey, John R.
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 265 - 271
  • [8] INSIGHTS INTO END-TO-END LEARNING SCHEME FOR LANGUAGE IDENTIFICATION
    Cai, Weicheng
    Cai, Zexin
    Liu, Wenbo
    Wang, Xiaoqi
    Li, Ming
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5209 - 5213
  • [9] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [10] A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION
    McDermott, Erik
    Sak, Hasim
    Variani, Ehsan
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 434 - 441