DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

被引：0

作者：

Liu, Alexander H. ^{[1
]}

Chang, Heng-Jui ^{[1
]}

Auli, Michael ^{[2
]}

Hsu, Wei-Ning ^{[2
]}

Glass, James ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] Meta AI, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units. Code available at https://github.com/Alexander-H- Liu/dinosr.

引用

页数：17

共 50 条

[1] Self-distillation improves self-supervised learning for DNA sequence inference
Yu, Tong
Cheng, Lei
Khalitov, Ruslan
Olsson, Erland B.
Yang, Zhirong
Neural Networks, 2025, 183
[2] Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
Zhang, Wenlin
Liu, Xuepeng
Niu, Tong
Yang, Xukui
Qu, Dan
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 461 - 471
[3] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
[4] Self-supervised Anomaly Detection by Self-distillation and Negative Sampling
Rafiee, Nima
Gholamipoor, Rahil
Adaloglou, Nikolas
Jaxy, Simon
Ramakers, Julius
Kollmann, Markus
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 459 - 470
[5] Monocular Depth Estimation via Self-Supervised Self-Distillation
Hu, Haifeng
Feng, Yuyang
Li, Dapeng
Zhang, Suofei
Zhao, Haitao
SENSORS, 2024, 24 (13)
[6] Self-supervised learning with self-distillation on COVID-19 medical image classification
Tan, Zhiyong
Yu, Yuhai
Meng, Jiana
Liu, Shuang
Li, Wei
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
[7] Video Face Clustering with Self-Supervised Representation Learning
Sharma V.
Tapaswi M.
Saquib Sarfraz M.
Stiefelhagen R.
IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
[8] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
[9] Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
Song, Kaiyou
Xie, Jin
Zhang, Shan
Luo, Zimeng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11848 - 11857
[10] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379

← 1 2 3 4 5 →