Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning

被引:2
|
作者
Zaiem, Salah [1 ,2 ]
Parcollet, Titouan [2 ]
Essid, Slim [1 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Palaiseau, France
[2] Avignon Univ, LIA, Avignon, France
来源
关键词
Self-Supervised Learning; Speech Representation Learning;
D O I
10.21437/Interspeech.2021-1027
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing features may serve as pseudolabels. However, the process of selecting pseudo-labels, for speech or other types of data, remains mostly unexplored and currently relies on observing the results on the final downstream task. Nevertheless, this methodology is not sustainable at scale due to substantial computational (hence carbon) costs. Thus, this paper introduces a practical and theoretical framework to select relevant pseudo-labels with respect to a given downstream task. More precisely, we propose a functional estimator of the pseudo-label utility grounded in the conditional independence theory, which does not require any training. The experiments conducted on speaker recognition and automatic speech recognition validate our estimator, showing a significant correlation between the performance observed on the downstream task and the utility estimates obtained with our approach, facilitating the prospection of relevant pseudo-labels for selfsupervised speech representation learning.
引用
收藏
页码:2851 / 2855
页数:5
相关论文
共 50 条
  • [1] Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    Heba, Abdelwahab
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1439 - 1453
  • [2] Self-Supervised Feature Enhancement: Applying Internal Pretext Task to Supervised Learning
    Xie, Tianshu
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Gong, Haigang
    Liu, Ming
    [J]. IEEE ACCESS, 2023, 11 : 1708 - 1717
  • [3] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [4] GCLR: A self-supervised representation learning pretext task for glomerular filtration barrier segmentation in TEM images
    Lin, Guoyu
    Zhang, Zhentai
    Long, Kaixing
    Zhang, Yiwen
    Lu, Yanmeng
    Geng, Jian
    Zhou, Zhitao
    Feng, Qianjin
    Lu, Lijun
    Cao, Lei
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 146
  • [5] Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
    Zhang, Yujia
    Po, Lai-Man
    Xu, Xuyuan
    Liu, Mengyang
    Wang, Yexin
    Ou, Weifeng
    Zhao, Yuzhi
    Yu, Wing-Yin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3380 - 3389
  • [6] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    [J]. IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [7] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [8] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 746 - 750
  • [9] Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    [J]. INTERSPEECH 2022, 2022, : 669 - 673
  • [10] Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
    Xu, Jiashu
    Stirenko, Sergii
    [J]. IEEE ACCESS, 2023, 11 : 82400 - 82409