SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR

被引:14
|
作者
Li, Bo [1 ]
Pang, Ruoming [1 ]
Sainath, Tara N. [1 ]
Gulati, Anmol [1 ]
Zhang, Yu [1 ]
Qin, James [1 ]
Haghani, Parisa [1 ]
Huang, W. Ronny [1 ]
Ma, Min [1 ]
Bai, Junwen [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
large-scale; multilingual speech recognition;
D O I
10.1109/ASRU51503.2021.9687871
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Building ASR models across many languages is a challenging multitask learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations on high resource languages are commonly observed due to interference from the heterogeneous multilingual data and reduction in per-language capacity. We conduct a capacity study on a 15-language task, with the amount of data per language varying from 7.6K to 53.5K hours. We adopt GShard [1] to efficiently scale up to 10B parameters. Empirically, we find that (1) scaling the number of model parameters is an effective way to solve the capacity bottleneck - our 500M-param model already outperforms monolingual baselines and scaling it to 1B and 10B brought further quality gains; (2) larger models are not only more data efficient, but also more efficient in terms of training cost as measured in TPU days - the 1B -param model reaches the same accuracy at 34% of training time as the 500M-param model; (3) given a fixed capacity budget, adding depth works better than width and large encoders do better than large decoders; (4) with continuous training, they can be adapted to new languages and domains.
引用
收藏
页码:1011 / 1018
页数:8
相关论文
共 50 条
  • [21] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
    Li, Jinyu
    Wu, Yu
    Gaur, Yashesh
    Wang, Chengyi
    Zhao, Rui
    Liu, Shujie
    INTERSPEECH 2020, 2020, : 1 - 5
  • [22] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [23] Using Large Language Model for End-to-End Chinese ASR and NER
    Li, Yuang
    Yu, Jiawei
    Zhang, Min
    Ren, Mengxin
    Zhao, Yanqing
    Zhao, Xiaofeng
    Tao, Shimin
    Su, Jinsong
    Yang, Hao
    INTERSPEECH 2024, 2024, : 822 - 826
  • [24] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Ye, Guoli
    Zhao, Rui
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
  • [25] Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
    Lu, Zhiyun
    Han, Wei
    Zhang, Yu
    Cao, Langliang
    INTERSPEECH 2021, 2021, : 3460 - 3464
  • [26] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
    Siqing Qin
    Longbiao Wang
    Sheng Li
    Jianwu Dang
    Lixin Pan
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [27] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling
    Qin, Siqing
    Wang, Longbiao
    Li, Sheng
    Dang, Jianwu
    Pan, Lixin
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [28] A large-scale, passive analysis of end-to-end TCP performance over GPRS
    Benko, P
    Malicsko, G
    Veres, A
    IEEE INFOCOM 2004: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-4, PROCEEDINGS, 2004, : 1882 - 1892
  • [29] Vigil: Effective End-to-end Monitoring for Large-scale Recommender Systems at Glance
    Saxena, Priyansh
    Manisha, R.
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5249 - 5250
  • [30] FusedNet: End-to-End Mobile Robot Relocalization in Dynamic Large-Scale Scene
    Chen, Fang-xing
    Tang, Yifan
    Tai, Cong
    Liu, Xue-ping
    Wu, Xiang
    Zhang, Tao
    Zeng, Long
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4099 - 4105