Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

被引：0

作者：

Lee, Mun-Hak ^{[1
]}

Lee, Jae-Hong ^{[1
]}

Kim, DoHee ^{[2
]}

Kol, Ye-Eun ^{[1
]}

Chang, Joon-Hyuk ^{[1
]}

机构：

[1] Hanyang Univ, Dept Elect Engn, Seoul, South Korea

[2] Hanyang Univ, Dept Artificial Intelligence Applicat, Seoul, South Korea

来源：

INTERSPEECH 2024 | 2024年

基金：

新加坡国家研究基金会;

关键词：

self-supervised learning; Wav2Vec; 2.0; mode collapse; diversity loss; speech recognition;

D O I：

10.21437/Interspeech.2024-1875

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mode collapse refers to the phenomenon where a representation model fits only a subset of modes in the feature space. Today, numerous self-supervised learning algorithms, including Wav2Vec 2.0, encounter the problem of reduced expressiveness due to mode collapse or dimension collapse. In this study, we experimentally verify that the highly skewed codebook distribution of theWav2Vec 2.0 exacerbates the mode collapse problem. Based on this empirical finding, we propose the balanced-infoNCE loss, which suppresses the emergence of over-represented modes. We show that the Wav2Vec 2.0 model trained with balanced-infoNCE loss maintains high codebook entropy and converges stably. Furthermore, through fine-tuning experiments on a multilingual dataset for the ASR task, we demonstrate that balanced-Wav2Vec 2.0 models exhibit superior generalization performance.

引用

页码：5058 / 5062

页数：5

共 6 条

[1] Wav2vec-C: A Self-supervised Model for Speech Representation Learning
Sadhu, Samik
He, Di
Huang, Che-Wei
Mallidi, Sri Harish
Wu, Minhua
Rastrow, Ariya
Stolcke, Andreas
Droppo, Jasha
Maas, Roland
INTERSPEECH 2021, 2021, : 711 - 715
[2] Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi -Modal Speech Representation
Zhu, Qiushi
Zhang, Jie
Gu, Yu
Hu, Yuchen
Dai, Lirong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19768 - 19776
[3] CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0
Lee, Jae-Hong
Lee, Chae-Won
Choi, Jin-Seong
Chang, Joon-Hyuk
Seong, Woo Kyeong
Lee, Jeonghan
INTERSPEECH 2022, 2022, : 3398 - 3402
[4] Enhancing Language Identification in Indian Context Through Exploiting Learned Features with Wav2Vec2.0
Gupta, Shivang
Motepalli, Kowshik Siva Sai
Kumar, Ravi
Narasinga, Vamsi
Mirishkar, Sai Ganesh
Vuppala, Anil Kumar
SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 503 - 512
[5] Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning
Maha A. Thafar
Mona Alshahrani
Somayah Albaradei
Takashi Gojobori
Magbubah Essack
Xin Gao
Scientific Reports, 12
[6] Affinity2Vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning
Thafar, Maha A.
Alshahrani, Mona
Albaradei, Somayah
Gojobori, Takashi
Essack, Magbubah
Gao, Xin
SCIENTIFIC REPORTS, 2022, 12 (01)

← 1 →