Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning

被引:2
|
作者
Zaiem, Salah [1 ,2 ]
Parcollet, Titouan [2 ]
Essid, Slim [1 ]
机构
[1] Inst Polytech Paris, Telecom Paris, LTCI, Palaiseau, France
[2] Avignon Univ, LIA, Avignon, France
来源
关键词
Self-Supervised Learning; Speech Representation Learning;
D O I
10.21437/Interspeech.2021-1027
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing features may serve as pseudolabels. However, the process of selecting pseudo-labels, for speech or other types of data, remains mostly unexplored and currently relies on observing the results on the final downstream task. Nevertheless, this methodology is not sustainable at scale due to substantial computational (hence carbon) costs. Thus, this paper introduces a practical and theoretical framework to select relevant pseudo-labels with respect to a given downstream task. More precisely, we propose a functional estimator of the pseudo-label utility grounded in the conditional independence theory, which does not require any training. The experiments conducted on speaker recognition and automatic speech recognition validate our estimator, showing a significant correlation between the performance observed on the downstream task and the utility estimates obtained with our approach, facilitating the prospection of relevant pseudo-labels for selfsupervised speech representation learning.
引用
收藏
页码:2851 / 2855
页数:5
相关论文
共 50 条
  • [21] Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Yang, Xukui
    Qu, Dan
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 461 - 471
  • [22] Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    [J]. INTERSPEECH 2021, 2021, : 1169 - 1173
  • [23] On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
    Parcollet, Titouan
    Zhang, Shucong
    Ramos, Alberto Gil C. P.
    van Dalen, Rogier
    Bhattacharya, Sourav
    [J]. INTERSPEECH 2023, 2023, : 581 - 585
  • [24] Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
    Mu, Zhaoxi
    Yang, Xinyu
    Sun, Sining
    Yang, Qing
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18815 - 18823
  • [25] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
    Kim, Eesung
    Jeon, Jae-Jin
    Seo, Hyeji
    Kim, Hoon
    [J]. INTERSPEECH 2022, 2022, : 1411 - 1415
  • [26] EXPLORING THE INTEGRATION OF SPEECH SEPARATION AND RECOGNITION WITH SELF-SUPERVISED LEARNING REPRESENTATION
    Masuyama, Yoshiki
    Chang, Xuankai
    Zhang, Wangyou
    Cornell, Samuele
    Wang, Zhong-Qiu
    Ono, Nobutaka
    Qian, Yanmin
    Watanabe, Shinji
    [J]. 2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [27] Whitening for Self-Supervised Representation Learning
    Ermolov, Aleksandr
    Siarohin, Aliaksandr
    Sangineto, Enver
    Sebe, Nicu
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Speech SimCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
    Jiang, Dongwei
    Li, Wubo
    Cao, Miao
    Zou, Wei
    Li, Xiangang
    [J]. INTERSPEECH 2021, 2021, : 1544 - 1548
  • [29] Self-Supervised Representation Learning for CAD
    Jones, Benjamin T.
    Hu, Michael
    Kodnongbua, Milin
    Kim, Vladimir G.
    Schulz, Adriana
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21327 - 21336
  • [30] Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Zhang, Hehao
    Di, Jirui
    Sun, Zhe
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (07) : 3775 - 3782