FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS

被引:4
|
作者
Haidar, Md Akmal [1 ]
Rezagholizadeh, Mehdi [1 ]
机构
[1] Montreal Res Ctr, Huawei Noahs Ark Lab, Montreal, PQ, Canada
关键词
automatic speech recognition; sequence-to-sequence; transformer; generative adversarial networks; adversarial training;
D O I
10.1109/ICASSP39728.2021.9413703
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the pre-trained ASR model is fine-tuned adversarially against the discriminator using an additional adversarial loss. Experiments on full LibriSpeech dataset show that our proposed approach outperforms baselines and conventional GAN-based adversarial models.
引用
收藏
页码:6204 / 6208
页数:5
相关论文
共 50 条
  • [31] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
    Gira, Michael
    Zhang, Ruisu
    Lee, Kangwook
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
  • [32] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
    Liu, Chaoming
    Zhu, Wenhao
    Zhang, Xiaoyu
    Zhai, Qiuhong
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582
  • [33] Fine-tuning of pre-trained convolutional neural networks for diabetic retinopathy screening: a clinical study
    Roshan, Saboora M.
    Karsaz, Ali
    Vejdani, Amir Hossein
    Roshan, Yaser M.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 21 (04) : 564 - 573
  • [34] Fine-tuning pre-trained neural networks for medical image classification in small clinical datasets
    Newton Spolaôr
    Huei Diana Lee
    Ana Isabel Mendes
    Conceição Veloso Nogueira
    Antonio Rafael Sabino Parmezan
    Weber Shoity Resende Takaki
    Claudio Saddy Rodrigues Coy
    Feng Chung Wu
    Rui Fonseca-Pinto
    Multimedia Tools and Applications, 2024, 83 (9) : 27305 - 27329
  • [35] Fine-tuning pre-trained neural networks for medical image classification in small clinical datasets
    Spolaor, Newton
    Lee, Huei Diana
    Mendes, Ana Isabel
    Nogueira, Conceicao Veloso
    Sabino Parmezan, Antonio Rafael
    Resende Takaki, Weber Shoity
    Rodrigues Coy, Claudio Saddy
    Wu, Feng Chung
    Fonseca-Pinto, Rui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (09) : 27305 - 27329
  • [36] Comparative Study of Fine-Tuning of Pre-Trained Convolutional Neural Networks for Diabetic Retinopathy Screening
    Mohammadian, Saboora
    Karsaz, Ali
    Roshan, Yaser M.
    2017 24TH NATIONAL AND 2ND INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING (ICBME), 2017, : 224 - 229
  • [37] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [38] Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
    Liu, Bin
    Nie, Shuai
    Liang, Shan
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 491 - 495
  • [39] END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES
    Morais, Edmilson
    Kuo, Hong-Kwang J.
    Thomas, Samuel
    Tuske, Zoltan
    Kingsbury, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7483 - 7487
  • [40] SAR Image Despeckling by Deep Neural Networks: from a Pre-Trained Model to an End-to-End Training Strategy
    Dalsasso, Emanuele
    Yang, Xiangli
    Denis, Loic
    Tupin, Florence
    Yang, Wen
    REMOTE SENSING, 2020, 12 (16)