FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS

被引:4
|
作者
Haidar, Md Akmal [1 ]
Rezagholizadeh, Mehdi [1 ]
机构
[1] Montreal Res Ctr, Huawei Noahs Ark Lab, Montreal, PQ, Canada
关键词
automatic speech recognition; sequence-to-sequence; transformer; generative adversarial networks; adversarial training;
D O I
10.1109/ICASSP39728.2021.9413703
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the pre-trained ASR model is fine-tuned adversarially against the discriminator using an additional adversarial loss. Experiments on full LibriSpeech dataset show that our proposed approach outperforms baselines and conventional GAN-based adversarial models.
引用
收藏
页码:6204 / 6208
页数:5
相关论文
共 50 条
  • [21] Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language Models
    Hee Lee, Dong
    Jang, Beakcheol
    IEEE ACCESS, 2024, 12 : 65333 - 65340
  • [22] BERTIVITS: The Posterior Encoder Fusion of Pre-Trained Models and Residual Skip Connections for End-to-End Speech Synthesis
    Wang, Zirui
    Song, Minqi
    Zhou, Dongbo
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [23] Self-Supervised Pre-Trained Speech Representation Based End-to-End Mispronunciation Detection and Diagnosis of Mandarin
    Shen, Yunfei
    Liu, Qingqing
    Fan, Zhixing
    Liu, Jiajun
    Wumaier, Aishan
    IEEE ACCESS, 2022, 10 : 106451 - 106462
  • [24] Perceptual Conditional Generative Adversarial Networks for End-to-End Image Colourization
    Halder, Shirsendu Sukanta
    De, Kanjar
    Roy, Partha Pratim
    COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 269 - 283
  • [25] FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks
    Yazdanbakhsh, Amir
    Brzozowski, Michael
    Khaleghi, Behnam
    Ghodrati, Soroush
    Samadi, Kambiz
    Kim, Nam Sung
    Esmaeilzadeh, Hadi
    PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 65 - 72
  • [26] Perception-guided generative adversarial network for end-to-end speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    APPLIED SOFT COMPUTING, 2022, 128
  • [27] Variational Monte Carlo on a Budget - Fine-tuning pre-trained NeuralWavefunctions
    Scherbela, Michael
    Gerard, Leon
    Grohs, Philipp
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [28] Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
    Guillaume, Severine
    Wisniewski, Guillaume
    Macaire, Cecile
    Jacques, Guillaume
    Michaud, Alexis
    Galliot, Benjamin
    Coavoux, Maximin
    Rossato, Solange
    Minh-Chau Nguyen
    Fily, Maxime
    PROCEEDINGS OF THE FIFTH WORKSHOP ON THE USE OF COMPUTATIONAL METHODS IN THE STUDY OF ENDANGERED LANGUAGES (COMPUTEL-5 2022), 2022, : 170 - 178
  • [29] Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
    Guillaume, Séverine
    Wisniewski, Guillaume
    Macaire, Cécile
    Jacques, Guillaume
    Michaud, Alexis
    Galliot, Benjamin
    Coavoux, Maximin
    Rossato, Solange
    Nguyên, Minh-Châu
    Fily, Maxime
    COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop, 2022, : 170 - 178
  • [30] Fine-Tuning Pre-Trained CodeBERT for Code Search in Smart Contract
    JIN Huan
    LI Qinying
    Wuhan University Journal of Natural Sciences, 2023, 28 (03) : 237 - 245