DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

被引：0

作者：

Liu, Alexander H. ^{[1
]}

Chang, Heng-Jui ^{[1
]}

Auli, Michael ^{[2
]}

Hsu, Wei-Ning ^{[2
]}

Glass, James ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

[2] Meta AI, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units. Code available at https://github.com/Alexander-H- Liu/dinosr.

引用

页数：17

共 50 条

[41] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
Jiao, Jianbo
Droste, Richard
Drukker, Lior
Papageorghiou, Aris T.
Noble, J. Alison
2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
[42] Context Autoencoder for Self-supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
International Journal of Computer Vision, 2024, 132 : 208 - 223
[43] Adaptive Similarity Bootstrapping for Self-Distillation based Representation Learning
Lebailly, Tim
Stegmueller, Thomas
Bozorgtabar, Behzad
Thiran, Jean-Philippe
Tuytelaars, Tinne
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16459 - 16468
[44] Self-supervised Representation Learning for Astronomical Images
Hayat, Md Abul
Stein, George
Harrington, Peter
Lukic, Zarija
Mustafa, Mustafa
ASTROPHYSICAL JOURNAL LETTERS, 2021, 911 (02)
[45] Self-supervised representation learning for trip recommendation
Gao, Qiang
Wang, Wei
Zhang, Kunpeng
Yang, Xin
Miao, Congcong
Li, Tianrui
KNOWLEDGE-BASED SYSTEMS, 2022, 247
[46] LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Evain, Solene
Ha Nguyen
Hang Le
Boito, Marcely Zanon
Mdhaffar, Salima
Alisamir, Sina
Tong, Ziyi
Tomashenko, Natalia
Dinarelli, Marco
Parcollet, Titouan
Allauzen, Alexandre
Esteve, Yannick
Lecouteux, Benjamin
Portet, Francois
Rossato, Solange
Ringeval, Fabien
Schwab, Didier
Besacier, Laurent
INTERSPEECH 2021, 2021, : 1439 - 1443
[47] SelfDoc: Self-Supervised Document Representation Learning
Li, Peizhao
Gu, Jiuxiang
Kuen, Jason
Morariu, Vlad, I
Zhao, Handong
Jain, Rajiv
Manjunatha, Varun
Liu, Hongfu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5648 - 5656
[48] Solving Inefficiency of Self-supervised Representation Learning
Wang, Guangrun
Wang, Keze
Wang, Guangcong
Torr, Philip H. S.
Lin, Liang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9485 - 9495
[49] Revisiting Self-Supervised Visual Representation Learning
Kolesnikov, Alexander
Zhai, Xiaohua
Beyer, Lucas
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
[50] Self-supervised Representation Learning on Dynamic Graphs
Tian, Sheng
Wu, Ruofan
Shi, Leilei
Zhu, Liang
Xiong, Tao
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1814 - 1823

← 1 2 3 4 5 →