FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS

被引:4
|
作者
Haidar, Md Akmal [1 ]
Rezagholizadeh, Mehdi [1 ]
机构
[1] Montreal Res Ctr, Huawei Noahs Ark Lab, Montreal, PQ, Canada
关键词
automatic speech recognition; sequence-to-sequence; transformer; generative adversarial networks; adversarial training;
D O I
10.1109/ICASSP39728.2021.9413703
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the pre-trained ASR model is fine-tuned adversarially against the discriminator using an additional adversarial loss. Experiments on full LibriSpeech dataset show that our proposed approach outperforms baselines and conventional GAN-based adversarial models.
引用
下载
收藏
页码:6204 / 6208
页数:5
相关论文
共 50 条
  • [11] End-to-End Visual Editing with a Generatively Pre-trained Artist
    Brown, Andrew
    Fu, Cheng-Yang
    Parkhi, Omkar
    Berg, Tamara L.
    Vedaldi, Andrea
    COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 18 - 35
  • [12] SPEECH SENTIMENT ANALYSIS VIA PRE-TRAINED FEATURES FROM END-TO-END ASR MODELS
    Lu, Zhiyun
    Cao, Liangliang
    Zhang, Yu
    Chiu, Chung-Cheng
    Fan, James
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7149 - 7153
  • [13] Overcoming Catastrophic Forgetting for Fine-Tuning Pre-trained GANs
    Zhang, Zeren
    Li, Xingjian
    Hong, Tao
    Wang, Tianyang
    Ma, Jinwen
    Xiong, Haoyi
    Xu, Cheng-Zhong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT V, 2023, 14173 : 293 - 308
  • [14] INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Seo, Seunghyun
    Kwak, Donghyun
    Lee, Bowon
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7152 - 7156
  • [15] LithoGAN: End-to-End Lithography Modeling with Generative Adversarial Networks
    Ye, Wei
    Alawieh, Mohamed Baker
    Lin, Yibo
    Pan, David Z.
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [16] Waste Classification by Fine-Tuning Pre-trained CNN and GAN
    Alsabei, Amani
    Alsayed, Ashwaq
    Alzahrani, Manar
    Al-Shareef, Sarah
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (08): : 65 - 70
  • [17] Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
    Na, Hyeong-Ju
    Park, Jeong-Sik
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [18] A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
    Li, Yan
    Wang, Yapeng
    Hoi, Lap Man
    Yang, Dingcheng
    Im, Sio-Kei
    Eurasip Journal on Audio, Speech, and Music Processing, 2025, 2025 (01)
  • [19] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
    Mehmood, Asif
    Khan, Muhammad Attique
    Sharif, Muhammad
    Khan, Sajid Ali
    Shaheen, Muhammad
    Saba, Tanzila
    Riaz, Naveed
    Ashraf, Imran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14979 - 14999
  • [20] Prosperous Human Gait Recognition: an end-to-end system based on pre-trained CNN features selection
    Asif Mehmood
    Muhammad Attique Khan
    Muhammad Sharif
    Sajid Ali Khan
    Muhammad Shaheen
    Tanzila Saba
    Naveed Riaz
    Imran Ashraf
    Multimedia Tools and Applications, 2024, 83 : 14979 - 14999