BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

被引:37
|
作者
Zhang, Yu [1 ]
Park, Daniel S. [1 ]
Han, Wei [1 ]
Qin, James [1 ]
Gulati, Anmol [1 ]
Shor, Joel [1 ]
Jansen, Aren [1 ]
Xu, Yuanzhong [1 ]
Huang, Yanping [1 ]
Wang, Shibo [1 ]
Zhou, Zongwei [1 ]
Li, Bo [1 ]
Ma, Min [1 ]
Chan, William [1 ]
Yu, Jiahui [1 ]
Wang, Yongqiang [1 ]
Cao, Liangliang [1 ]
Sim, Khe Chai [1 ]
Ramabhadran, Bhuvana [1 ]
Sainath, Tara N. [1 ]
Beaufays, Francoise [1 ]
Chen, Zhifeng [1 ]
Le, Quoc, V [1 ]
Chiu, Chung-Cheng [1 ]
Pang, Ruoming [1 ]
Wu, Yonghui [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
Task analysis; Training; Data models; Benchmark testing; Semisupervised learning; Context modeling; Computational modeling; Giant model; large-scale self-supervisedlearning; self-supervised learning; semisupervised learning; speech recognition;
D O I
10.1109/JSTSP.2022.3182537
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34 k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.
引用
收藏
页码:1519 / 1532
页数:14
相关论文
共 50 条
  • [1] Large-Scale Self- and Semi-Supervised Learning for Speech Translation
    Wang, Changhan
    Wu, Anne
    Pino, Juan
    Baevski, Alexei
    Auli, Michael
    Conneau, Alexis
    [J]. INTERSPEECH 2021, 2021, : 2242 - 2246
  • [2] Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning
    Wang, Zitong
    Wang, Li
    Chan, Raymond
    Zeng, Tieyong
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 367 - 383
  • [3] Semi-supervised learning on large-scale geotagged photos for situation recognition
    Tang, Mengfan
    Nie, Feiping
    Pongpaichet, Siripen
    Jain, Ramesh
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 48 : 310 - 316
  • [4] Large-scale image recognition based on parallel kernel supervised and semi-supervised subspace learning
    Fei Wu
    Xiao-Yuan Jing
    Qian Liu
    Song-Song Wu
    Guo-Liang He
    [J]. Neural Computing and Applications, 2017, 28 : 483 - 498
  • [5] Large-scale image recognition based on parallel kernel supervised and semi-supervised subspace learning
    Wu, Fei
    Jing, Xiao-Yuan
    Liu, Qian
    Wu, Song-Song
    He, Guo-Liang
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 (03): : 483 - 498
  • [6] VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
    Wang, Changhan
    Riviere, Morgane
    Lee, Ann
    Wu, Anne
    Talnikar, Chaitanya
    Haziza, Daniel
    Williamson, Mary
    Pino, Juan
    Dupoux, Emmanuel
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 993 - 1003
  • [7] Transductive Centroid Projection for Semi-supervised Large-Scale Recognition
    Liu, Yu
    Song, Guanglu
    Shao, Jing
    Jin, Xiao
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 72 - 89
  • [8] Nonnegative Spectral Clustering for Large-Scale Semi-supervised Learning
    Hu, Weibo
    Chen, Chuan
    Ye, Fanghua
    Zheng, Zibin
    Ling, Guohui
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 287 - 291
  • [9] Semi-Supervised Hashing for Large-Scale Search
    Wang, Jun
    Kumar, Sanjiv
    Chang, Shih-Fu
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (12) : 2393 - 2406
  • [10] Deep Semi-Supervised Learning With Contrastive Learning in Large Vocabulary Automatic Chord Recognition
    Li, Chen
    Li, Yu
    Song, Hui
    Tian, Lihua
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 1065 - 1069