DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

被引:0
|
作者
Liu, Alexander H. [1 ]
Chang, Heng-Jui [1 ]
Auli, Michael [2 ]
Hsu, Wei-Ning [2 ]
Glass, James [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Meta AI, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units. Code available at https://github.com/Alexander-H- Liu/dinosr.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [42] Context Autoencoder for Self-supervised Representation Learning
    Xiaokang Chen
    Mingyu Ding
    Xiaodi Wang
    Ying Xin
    Shentong Mo
    Yunhao Wang
    Shumin Han
    Ping Luo
    Gang Zeng
    Jingdong Wang
    International Journal of Computer Vision, 2024, 132 : 208 - 223
  • [43] Adaptive Similarity Bootstrapping for Self-Distillation based Representation Learning
    Lebailly, Tim
    Stegmueller, Thomas
    Bozorgtabar, Behzad
    Thiran, Jean-Philippe
    Tuytelaars, Tinne
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16459 - 16468
  • [44] Self-supervised Representation Learning for Astronomical Images
    Hayat, Md Abul
    Stein, George
    Harrington, Peter
    Lukic, Zarija
    Mustafa, Mustafa
    ASTROPHYSICAL JOURNAL LETTERS, 2021, 911 (02)
  • [45] Self-supervised representation learning for trip recommendation
    Gao, Qiang
    Wang, Wei
    Zhang, Kunpeng
    Yang, Xin
    Miao, Congcong
    Li, Tianrui
    KNOWLEDGE-BASED SYSTEMS, 2022, 247
  • [46] LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
    Evain, Solene
    Ha Nguyen
    Hang Le
    Boito, Marcely Zanon
    Mdhaffar, Salima
    Alisamir, Sina
    Tong, Ziyi
    Tomashenko, Natalia
    Dinarelli, Marco
    Parcollet, Titouan
    Allauzen, Alexandre
    Esteve, Yannick
    Lecouteux, Benjamin
    Portet, Francois
    Rossato, Solange
    Ringeval, Fabien
    Schwab, Didier
    Besacier, Laurent
    INTERSPEECH 2021, 2021, : 1439 - 1443
  • [47] SelfDoc: Self-Supervised Document Representation Learning
    Li, Peizhao
    Gu, Jiuxiang
    Kuen, Jason
    Morariu, Vlad, I
    Zhao, Handong
    Jain, Rajiv
    Manjunatha, Varun
    Liu, Hongfu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5648 - 5656
  • [48] Solving Inefficiency of Self-supervised Representation Learning
    Wang, Guangrun
    Wang, Keze
    Wang, Guangcong
    Torr, Philip H. S.
    Lin, Liang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9485 - 9495
  • [49] Revisiting Self-Supervised Visual Representation Learning
    Kolesnikov, Alexander
    Zhai, Xiaohua
    Beyer, Lucas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1920 - 1929
  • [50] Self-supervised Representation Learning on Dynamic Graphs
    Tian, Sheng
    Wu, Ruofan
    Shi, Leilei
    Zhu, Liang
    Xiong, Tao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1814 - 1823