Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

被引:0
|
作者
Lee, Mun-Hak [1 ]
Lee, Jae-Hong [1 ]
Kim, DoHee [2 ]
Kol, Ye-Eun [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Dept Elect Engn, Seoul, South Korea
[2] Hanyang Univ, Dept Artificial Intelligence Applicat, Seoul, South Korea
来源
基金
新加坡国家研究基金会;
关键词
self-supervised learning; Wav2Vec; 2.0; mode collapse; diversity loss; speech recognition;
D O I
10.21437/Interspeech.2024-1875
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mode collapse refers to the phenomenon where a representation model fits only a subset of modes in the feature space. Today, numerous self-supervised learning algorithms, including Wav2Vec 2.0, encounter the problem of reduced expressiveness due to mode collapse or dimension collapse. In this study, we experimentally verify that the highly skewed codebook distribution of theWav2Vec 2.0 exacerbates the mode collapse problem. Based on this empirical finding, we propose the balanced-infoNCE loss, which suppresses the emergence of over-represented modes. We show that the Wav2Vec 2.0 model trained with balanced-infoNCE loss maintains high codebook entropy and converges stably. Furthermore, through fine-tuning experiments on a multilingual dataset for the ASR task, we demonstrate that balanced-Wav2Vec 2.0 models exhibit superior generalization performance.
引用
收藏
页码:5058 / 5062
页数:5
相关论文
共 6 条
  • [1] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
    Sadhu, Samik
    He, Di
    Huang, Che-Wei
    Mallidi, Sri Harish
    Wu, Minhua
    Rastrow, Ariya
    Stolcke, Andreas
    Droppo, Jasha
    Maas, Roland
    INTERSPEECH 2021, 2021, : 711 - 715
  • [2] Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi -Modal Speech Representation
    Zhu, Qiushi
    Zhang, Jie
    Gu, Yu
    Hu, Yuchen
    Dai, Lirong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19768 - 19776
  • [3] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
    Lee, Jae-Hong
    Lee, Chae-Won
    Choi, Jin-Seong
    Chang, Joon-Hyuk
    Seong, Woo Kyeong
    Lee, Jeonghan
    INTERSPEECH 2022, 2022, : 3398 - 3402
  • [4] Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0
    Gupta, Shivang
    Motepalli, Kowshik Siva Sai
    Kumar, Ravi
    Narasinga, Vamsi
    Mirishkar, Sai Ganesh
    Vuppala, Anil Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 503 - 512
  • [5] Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning
    Maha A. Thafar
    Mona Alshahrani
    Somayah Albaradei
    Takashi Gojobori
    Magbubah Essack
    Xin Gao
    Scientific Reports, 12
  • [6] Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning
    Thafar, Maha A.
    Alshahrani, Mona
    Albaradei, Somayah
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    SCIENTIFIC REPORTS, 2022, 12 (01)