FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS

被引:4
|
作者
Haidar, Md Akmal [1 ]
Rezagholizadeh, Mehdi [1 ]
机构
[1] Montreal Res Ctr, Huawei Noahs Ark Lab, Montreal, PQ, Canada
关键词
automatic speech recognition; sequence-to-sequence; transformer; generative adversarial networks; adversarial training;
D O I
10.1109/ICASSP39728.2021.9413703
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the pre-trained ASR model is fine-tuned adversarially against the discriminator using an additional adversarial loss. Experiments on full LibriSpeech dataset show that our proposed approach outperforms baselines and conventional GAN-based adversarial models.
引用
收藏
页码:6204 / 6208
页数:5
相关论文
共 50 条
  • [1] AIPNET: GENERATIVE ADVERSARIAL PRE-TRAINING OF ACCENT-INVARIANT NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Chen, Yi-Chen
    Yang, Zhaojun
    Yeh, Ching-Feng
    Jain, Mahaveer
    Seltzer, Michael L.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6979 - 6983
  • [2] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
    Wang, Chengyi
    Wu, Yu
    Liu, Shujie
    Yang, Zhenglu
    Zhou, Ming
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
  • [3] IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Yang, Zehui
    Watanabe, Shinji
    Higuchi, Yosuke
    Cheng, Gaofeng
    Zhang, Pengyuan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8522 - 8526
  • [4] End-to-end speech topic classification based on pre-trained model Wavlm
    Cao, Tengfei
    He, Liang
    Niu, Fangjing
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 369 - 373
  • [5] End-to-End Video-to-Speech Synthesis Using Generative Adversarial Networks
    Mira, Rodrigo
    Vougioukas, Konstantinos
    Ma, Pingchuan
    Petridis, Stavros
    Schuller, Bjoern W.
    Pantic, Maja
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (06) : 3454 - 3466
  • [6] End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021
    Gallego, Gerard, I
    Tsiamas, Ioannis
    Escolano, Carlos
    Fonollosa, Jose A. R.
    Costa-jussa, Marta R.
    [J]. IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 110 - 119
  • [7] Confounder balancing in adversarial domain adaptation for pre-trained large models fine-tuning
    Jiang, Shuoran
    Chen, Qingcai
    Xiang, Yang
    Pan, Youcheng
    Wu, Xiangping
    Lin, Yukang
    [J]. NEURAL NETWORKS, 2024, 173
  • [8] Pruning Pre-trained Language ModelsWithout Fine-Tuning
    Jiang, Ting
    Wang, Deqing
    Zhuang, Fuzhen
    Xie, Ruobing
    Xia, Feng
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 594 - 605
  • [9] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [10] End-to-End Pre-trained Dialogue System for Automatic Diagnosis
    Wang, Yuan
    Li, Zekun
    Zeng, Leilei
    Zhao, Tingting
    [J]. CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 82 - 91