Auxiliary feature based adaptation of end-to-end ASR systems

被引:22
|
作者
Delcroix, Marc [1 ]
Watanabe, Shinji [2 ]
Ogawa, Atsunori [1 ]
Karita, Shigeki [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
关键词
speech recognition; adaptation; end-to-end; auxiliary feature; SPEECH; TRANSFORMATIONS; MODELS;
D O I
10.21437/Interspeech.2018-1438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic model adaptation has been widely used to adapt models to speakers or environments. For example, appending auxiliary features representing speakers such as i-vectors to the input of a deep neural network (DNN) is an effective way to realize unsupervised adaptation of DNN-hybrid automatic speech recognition (ASR) systems. Recently, end-to-end (E2E) models have been proposed as an alternative to conventional DNN-hybrid ASR systems. E2E models map a speech signal to a sequence of characters or words using a single neural network, which greatly simplifies the ASR pipeline. However, adaptation of E2E models has received little attention yet. In this paper, we investigate auxiliary feature based adaptation for encoder-decoder E2E models. We employ a recently proposed sequence summary network to compute auxiliary features instead of i-vectors, as it can be easily integrated into E2E models and keep the ASR pipeline simple. Indeed, the sequence summary network allows the auxiliary feature extraction module to be a part of the computational graph of the E2E model. We demonstrate that the proposed adaptation scheme consistently improves recognition performance of three publicly available recognition tasks.
引用
收藏
页码:2444 / 2448
页数:5
相关论文
共 50 条
  • [1] UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
    Sivaraman, Ganesh
    Casal, Ricardo
    Garland, Matt
    Khoury, Elie
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6987 - 6991
  • [2] A STUDY OF TRANSDUCER BASED END-TO-END ASR WITH ESPNET: ARCHITECTURE, AUXILIARY LOSS AND DECODING STRATEGIES
    Boyer, Florian
    Shinohara, Yusuke
    Ishii, Takaaki
    Inaguma, Hirofumi
    Watanabe, Shinji
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 16 - 23
  • [3] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [4] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [5] Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
    Joshi, Vikas
    Das, Amit
    Sun, Eric
    Mehta, Rupesh R.
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 1767 - 1771
  • [6] IMPROVING ATTENTION-BASED END-TO-END ASR SYSTEMS WITH SEQUENCE-BASED LOSS FUNCTIONS
    Cui, Jia
    Weng, Chao
    Wang, Guangsen
    Wang, Jun
    Wang, Peidong
    Yu, Chengzhu
    Su, Dan
    Yu, Dong
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 353 - 360
  • [7] Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation
    Yeh, Sung-Lin
    Lin, Yun-Shao
    Lee, Chi-Chun
    [J]. INTERSPEECH 2020, 2020, : 536 - 540
  • [8] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [9] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    [J]. IEEE ACCESS, 2024, 12 : 92960 - 92975
  • [10] End-to-End Topic Classification without ASR
    Dong, Zexian
    Liu, Jia
    Zhang, Wei-Qiang
    [J]. 2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,