Linear-Complexity Self-Supervised Learning for Speech Processing

被引:0
|
作者
Zhang, Shucong [1 ]
Parcollet, Titouan [1 ]
van Dalen, Rogier [1 ]
Bhattacharya, Sourav [1 ]
机构
[1] Samsung AI Ctr Cambridge, Cambridge, England
来源
关键词
self-supervised learning; efficient models;
D O I
10.21437/Interspeech.2024-500
中图分类号
学科分类号
摘要
Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code(1) is available.
引用
收藏
页码:3480 / 3484
页数:5
相关论文
共 50 条
  • [41] Credal Self-Supervised Learning
    Lienen, Julian
    Huellermeier, Eyke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Self-Supervised Learning for Recommendation
    Huang, Chao
    Xia, Lianghao
    Wang, Xiang
    He, Xiangnan
    Yin, Dawei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5136 - 5139
  • [43] Quantum self-supervised learning
    Jaderberg, B.
    Anderson, L. W.
    Xie, W.
    Albanie, S.
    Kiffner, M.
    Jaksch, D.
    QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
  • [44] Self-Supervised Learning for Electroencephalography
    Rafiei, Mohammad H.
    Gauthier, Lynne V.
    Adeli, Hojjat
    Takabi, Daniel
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1457 - 1471
  • [45] Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
    Huang, Kuan Po
    Fu, Yu-Kuan
    Zhang, Yu
    Lee, Hung-yi
    INTERSPEECH 2022, 2022, : 2193 - 2197
  • [46] A New Self-supervised Method for Supervised Learning
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Liu, Ming
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [47] DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
    Liu, Alexander H.
    Chang, Heng-Jui
    Auli, Michael
    Hsu, Wei-Ning
    Glass, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Self-Supervised Linear Motion Deblurring
    Liu, Peidong
    Janai, Joel
    Pollefeys, Marc
    Sattler, Torsten
    Geiger, Andreas
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02): : 2475 - 2482
  • [49] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
    Chen, Zhehuai
    Zhang, Yu
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Wang, Gary
    Moreno, Pedro
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258
  • [50] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
    Lin, Tzu-Quan
    Lin, Guan-Ting
    Lee, Hung-Yi
    Tang, Hao
    arXiv,