BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

被引:37
|
作者
Zhang, Yu [1 ]
Park, Daniel S. [1 ]
Han, Wei [1 ]
Qin, James [1 ]
Gulati, Anmol [1 ]
Shor, Joel [1 ]
Jansen, Aren [1 ]
Xu, Yuanzhong [1 ]
Huang, Yanping [1 ]
Wang, Shibo [1 ]
Zhou, Zongwei [1 ]
Li, Bo [1 ]
Ma, Min [1 ]
Chan, William [1 ]
Yu, Jiahui [1 ]
Wang, Yongqiang [1 ]
Cao, Liangliang [1 ]
Sim, Khe Chai [1 ]
Ramabhadran, Bhuvana [1 ]
Sainath, Tara N. [1 ]
Beaufays, Francoise [1 ]
Chen, Zhifeng [1 ]
Le, Quoc, V [1 ]
Chiu, Chung-Cheng [1 ]
Pang, Ruoming [1 ]
Wu, Yonghui [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
Task analysis; Training; Data models; Benchmark testing; Semisupervised learning; Context modeling; Computational modeling; Giant model; large-scale self-supervisedlearning; self-supervised learning; semisupervised learning; speech recognition;
D O I
10.1109/JSTSP.2022.3182537
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34 k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.
引用
收藏
页码:1519 / 1532
页数:14
相关论文
共 50 条
  • [21] Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning
    Humayun, Mohammad Ali
    Hameed, Ibrahim A.
    Shah, Syed Muslim
    Khan, Sohaib Hassan
    Zafar, Irfan
    Bin Ahmed, Saad
    Shuja, Junaid
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (09):
  • [22] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
    Zhu, Han
    Gao, Dongji
    Cheng, Gaofeng
    Povey, Daniel
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3320 - 3330
  • [23] Semi-supervised Learning for Large Scale Image Cosegmentation
    Wang, Zhengxiang
    Liu, Rujie
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 393 - 400
  • [24] Automatic Leaf Recognition Based on Deep Semi-Supervised Learning
    Wu, Huisi
    Xiao, Fangyan
    Shi, Zhouan
    Wen, Zhenkun
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (10): : 1469 - 1478
  • [25] Semi-Supervised Learning in Large Scale Text Categorization
    许泽文
    李建强
    刘博
    毕敬
    李蓉
    毛睿
    [J]. Journal of Shanghai Jiaotong University(Science), 2017, 22 (03) : 291 - 302
  • [26] Semi-supervised learning in large scale text categorization
    Xu Z.
    Li J.
    Liu B.
    Bi J.
    Li R.
    Mao R.
    [J]. Journal of Shanghai Jiaotong University (Science), 2017, 22 (3) : 291 - 302
  • [27] Exploring Transformers for Large-Scale Speech Recognition
    Lu, Liang
    Liu, Changliang
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2020, 2020, : 5041 - 5045
  • [28] Semi-Supervised Learning of Speech Sounds
    Jansen, Aren
    Niyogi, Partha
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2264 - 2267
  • [29] Semi-supervised multi-view binary learning for large-scale image clustering
    Liu, Mingyang
    Yang, Zuyuan
    Han, Wei
    Chen, Junhang
    Sun, Weijun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (13) : 14853 - 14870
  • [30] LARGE-SCALE ASR DOMAIN ADAPTATION USING SELF- AND SEMI-SUPERVISED LEARNING
    Hwang, Dongseong
    Misra, Ananya
    Huo, Zhouyuan
    Siddhartha, Nikhil
    Garg, Shefali
    Qiu, David
    Sim, Khe Chai
    Strohman, Trevor
    Beaufays, Francoise
    He, Yanzhang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6627 - 6631