DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

被引:0
|
作者
Liu, Alexander H. [1 ]
Chang, Heng-Jui [1 ]
Auli, Michael [2 ]
Hsu, Wei-Ning [2 ]
Glass, James [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Meta AI, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units. Code available at https://github.com/Alexander-H- Liu/dinosr.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Self-distillation improves self-supervised learning for DNA sequence inference
    Yu, Tong
    Cheng, Lei
    Khalitov, Ruslan
    Olsson, Erland B.
    Yang, Zhirong
    Neural Networks, 2025, 183
  • [2] Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Yang, Xukui
    Qu, Dan
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 461 - 471
  • [3] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [4] Self-supervised Anomaly Detection by Self-distillation and Negative Sampling
    Rafiee, Nima
    Gholamipoor, Rahil
    Adaloglou, Nikolas
    Jaxy, Simon
    Ramakers, Julius
    Kollmann, Markus
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 459 - 470
  • [5] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)
  • [6] Self-supervised learning with self-distillation on COVID-19 medical image classification
    Tan, Zhiyong
    Yu, Yuhai
    Meng, Jiana
    Liu, Shuang
    Li, Wei
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 243
  • [7] Video Face Clustering with Self-Supervised Representation Learning
    Sharma V.
    Tapaswi M.
    Saquib Sarfraz M.
    Stiefelhagen R.
    IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [8] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [9] Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning
    Song, Kaiyou
    Xie, Jin
    Zhang, Shan
    Luo, Zimeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11848 - 11857
  • [10] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379