Auxiliary feature based adaptation of end-to-end ASR systems

被引:22
|
作者
Delcroix, Marc [1 ]
Watanabe, Shinji [2 ]
Ogawa, Atsunori [1 ]
Karita, Shigeki [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
关键词
speech recognition; adaptation; end-to-end; auxiliary feature; SPEECH; TRANSFORMATIONS; MODELS;
D O I
10.21437/Interspeech.2018-1438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic model adaptation has been widely used to adapt models to speakers or environments. For example, appending auxiliary features representing speakers such as i-vectors to the input of a deep neural network (DNN) is an effective way to realize unsupervised adaptation of DNN-hybrid automatic speech recognition (ASR) systems. Recently, end-to-end (E2E) models have been proposed as an alternative to conventional DNN-hybrid ASR systems. E2E models map a speech signal to a sequence of characters or words using a single neural network, which greatly simplifies the ASR pipeline. However, adaptation of E2E models has received little attention yet. In this paper, we investigate auxiliary feature based adaptation for encoder-decoder E2E models. We employ a recently proposed sequence summary network to compute auxiliary features instead of i-vectors, as it can be easily integrated into E2E models and keep the ASR pipeline simple. Indeed, the sequence summary network allows the auxiliary feature extraction module to be a part of the computational graph of the E2E model. We demonstrate that the proposed adaptation scheme consistently improves recognition performance of three publicly available recognition tasks.
引用
收藏
页码:2444 / 2448
页数:5
相关论文
共 50 条
  • [31] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    [J]. INTERSPEECH 2019, 2019, : 3490 - 3494
  • [32] Comparison and analysis of new curriculum criteria for end-to-end ASR
    Karakasidis, Georgios
    Kurimo, Mikko
    Bell, Peter
    Grosz, Tamas
    [J]. SPEECH COMMUNICATION, 2024, 163
  • [33] End-to-end ASR to jointly predict transcriptions and linguistic annotations
    Omachi, Motoi
    Fujita, Yuya
    Watanabe, Shinji
    Wiesner, Matthew
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1861 - 1871
  • [34] Iterative Compression of End-to-End ASR Model using AutoML
    Mehrotra, Abhinav
    Dudziak, Lukasz
    Yeo, Jinsu
    Lee, Young-yoon
    Vipperla, Ravichander
    Abdelfattah, Mohamed S.
    Bhattacharya, Sourav
    Ishtiaq, Samin
    Ramos, Alberto Gil C. P.
    Lee, SangJeong
    Kim, Daehyun
    Lane, Nicholas D.
    [J]. INTERSPEECH 2020, 2020, : 3361 - 3365
  • [35] Data Augmentation Using CycleGAN for End-to-End Children ASR
    Singh, Dipesh K.
    Amin, Preet P.
    Sailor, Hardik B.
    Patil, Hemant A.
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 511 - 515
  • [36] Multi-Modal Data Augmentation for End-to-End ASR
    Renduchintala, Adithya
    Ding, Shuoyang
    Wiesner, Matthew
    Watanabe, Shinji
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2394 - 2398
  • [37] A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
    Joshi, Raviraj
    Singh, Anupam
    [J]. PROCEEDINGS OF THE 5TH WORKSHOP ON E-COMMERCE AND NLP (ECNLP 5), 2022, : 244 - 249
  • [38] End-to-End ASR with Adaptive Span Self-Attention
    Chang, Xuankai
    Subramanian, Aswin Shanmugam
    Guo, Pengcheng
    Watanabe, Shinji
    Fujita, Yuya
    Omachi, Motoi
    [J]. INTERSPEECH 2020, 2020, : 3595 - 3599
  • [39] TWO-PASS END-TO-END ASR MODEL COMPRESSION
    Dawalatabad, Nauman
    Vatsal, Tushar
    Gupta, Ashutosh
    Kim, Sungsoo
    Singh, Shatrughan
    Gowda, Dhananjaya
    Kim, Chanwoo
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 403 - 410
  • [40] An End-to-End Framework for Clothing Collocation Based on Semantic Feature Fusion
    Zhao, Mingbo
    Liu, Yu
    Li, Xianrui
    Zhang, Zhao
    Zhang, Yue
    [J]. IEEE MULTIMEDIA, 2020, 27 (04) : 122 - 132