Auxiliary feature based adaptation of end-to-end ASR systems

被引:22
|
作者
Delcroix, Marc [1 ]
Watanabe, Shinji [2 ]
Ogawa, Atsunori [1 ]
Karita, Shigeki [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
关键词
speech recognition; adaptation; end-to-end; auxiliary feature; SPEECH; TRANSFORMATIONS; MODELS;
D O I
10.21437/Interspeech.2018-1438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic model adaptation has been widely used to adapt models to speakers or environments. For example, appending auxiliary features representing speakers such as i-vectors to the input of a deep neural network (DNN) is an effective way to realize unsupervised adaptation of DNN-hybrid automatic speech recognition (ASR) systems. Recently, end-to-end (E2E) models have been proposed as an alternative to conventional DNN-hybrid ASR systems. E2E models map a speech signal to a sequence of characters or words using a single neural network, which greatly simplifies the ASR pipeline. However, adaptation of E2E models has received little attention yet. In this paper, we investigate auxiliary feature based adaptation for encoder-decoder E2E models. We employ a recently proposed sequence summary network to compute auxiliary features instead of i-vectors, as it can be easily integrated into E2E models and keep the ASR pipeline simple. Indeed, the sequence summary network allows the auxiliary feature extraction module to be a part of the computational graph of the E2E model. We demonstrate that the proposed adaptation scheme consistently improves recognition performance of three publicly available recognition tasks.
引用
收藏
页码:2444 / 2448
页数:5
相关论文
共 50 条
  • [11] Phonemic competition in end-to-end ASR models
    ten Bosch, Louis
    Bentum, Martijn
    Boves, Lou
    [J]. INTERSPEECH 2023, 2023, : 586 - 590
  • [12] Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
    Cui, Mingyu
    Deng, Jiajun
    Hu, Shoukang
    Xie, Xurong
    Wang, Tianzi
    Hu, Shujie
    Geng, Mengzhe
    Xue, Boyang
    Liu, Xunying
    Meng, Helen
    [J]. INTERSPEECH 2022, 2022, : 3158 - 3162
  • [13] Spelling-Aware Word-Based End-to-End ASR
    Egorova, Ekaterina
    Vydana, Hari Krishna
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1729 - 1733
  • [14] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [15] End-to-End Speaker-Attributed ASR with Transformer
    Kanda, Naoyuki
    Ye, Guoli
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Yoshioka, Takuya
    [J]. INTERSPEECH 2021, 2021, : 4413 - 4417
  • [16] SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
    Bansal, Shubham
    Malhotra, Karan
    Ganapathy, Sriram
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 494 - 501
  • [17] TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
    Li, Bo
    Chang, Shuo-yiin
    Sainath, Tara N.
    Pang, Ruoming
    He, Yanzhang
    Strohman, Trevor
    Wu, Yonghui
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6069 - 6073
  • [18] Improving Performance of End-to-End ASR on Numeric Sequences
    Peyser, Cal
    Zhang, Hao
    Sainath, Tara N.
    Wu, Zelin
    [J]. INTERSPEECH 2019, 2019, : 2185 - 2189
  • [19] INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR
    Van Tung Pham
    Xu, Haihua
    Khassanov, Yerbolat
    Zeng, Zhiping
    Chng, Eng Siong
    Ni, Chongjia
    Ma, Bin
    Li, Haizhou
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7059 - 7063
  • [20] A BETTER AND FASTER END-TO-END MODEL FOR STREAMING ASR
    Li, Bo
    Gulati, Anmol
    Yu, Jiahui
    Sainath, Tara N.
    Chiu, Chung-Cheng
    Narayanan, Arun
    Chang, Shuo-Yiin
    Pang, Ruoming
    He, Yanzhang
    Qin, James
    Han, Wei
    Liang, Qiao
    Zhang, Yu
    Strohman, Trevor
    Wu, Yonghui
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5634 - 5638