Linear-Complexity Self-Supervised Learning for Speech Processing

被引：0

作者：

Zhang, Shucong ^{[1
]}

Parcollet, Titouan ^{[1
]}

van Dalen, Rogier ^{[1
]}

Bhattacharya, Sourav ^{[1
]}

机构：

[1] Samsung AI Ctr Cambridge, Cambridge, England

来源：

INTERSPEECH 2024 | 2024年

关键词：

self-supervised learning; efficient models;

D O I：

10.21437/Interspeech.2024-500

中图分类号：

学科分类号：

摘要：

Self-supervised learning (SSL) models usually require weeks of pre-training with dozens of high-end GPUs. These models typically have a multi-headed self-attention (MHSA) context encoder. However, MHSA takes quadratic time and space in the input length, contributing to the high pre-training cost. Linear-complexity alternatives to MHSA have been proposed. For instance, in supervised training, the SummaryMixing model is the first to outperform MHSA across multiple speech processing tasks. However, these cheaper alternatives have not been explored for SSL yet. This paper studies a linear-complexity context encoder for SSL for the first time. With better or equivalent performance for the downstream tasks of the MP3S benchmark, SummaryMixing reduces the pre-training time and peak VRAM of wav2vec 2.0 model by 18% and by 23%, respectively, leading to the pre-training of a 155M wav2vec 2.0 model finished within one week with 4 Tesla A100 GPUs. Code(1) is available.

引用

页码：3480 / 3484

页数：5

共 50 条

[41] Credal Self-Supervised Learning
Lienen, Julian
Huellermeier, Eyke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[42] Self-Supervised Learning for Recommendation
Huang, Chao
Xia, Lianghao
Wang, Xiang
He, Xiangnan
Yin, Dawei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5136 - 5139
[43] Quantum self-supervised learning
Jaderberg, B.
Anderson, L. W.
Xie, W.
Albanie, S.
Kiffner, M.
Jaksch, D.
QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
[44] Self-Supervised Learning for Electroencephalography
Rafiei, Mohammad H.
Gauthier, Lynne V.
Adeli, Hojjat
Takabi, Daniel
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1457 - 1471
[45] Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
Huang, Kuan Po
Fu, Yu-Kuan
Zhang, Yu
Lee, Hung-yi
INTERSPEECH 2022, 2022, : 2193 - 2197
[46] A New Self-supervised Method for Supervised Learning
Yang, Yuhang
Ding, Zilin
Cheng, Xuan
Wang, Xiaomin
Liu, Ming
INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
[47] DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Liu, Alexander H.
Chang, Heng-Jui
Auli, Michael
Hsu, Wei-Ning
Glass, James
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] Self-Supervised Linear Motion Deblurring
Liu, Peidong
Janai, Joel
Pollefeys, Marc
Sattler, Torsten
Geiger, Andreas
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02): : 2475 - 2482
[49] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
Chen, Zhehuai
Zhang, Yu
Rosenberg, Andrew
Ramabhadran, Bhuvana
Wang, Gary
Moreno, Pedro
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258
[50] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
Lin, Tzu-Quan
Lin, Guan-Ting
Lee, Hung-Yi
Tang, Hao
arXiv,

← 1 2 3 4 5 →