Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

被引：0

作者：

Gong, Xun ^{[1
]}

Lu, Yizhou ^{[1
]}

Zhou, Zhikai ^{[1
]}

Qian, Yanmin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, AI Inst, X LANCE Lab,Dept Comp Sci & Engn, Shanghai, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

关键词：

automatic speech recognition; multi-accent; layer-wise adaptation; end-to-end; MIXTURE;

D O I：

10.21437/Interspeech.2021-1075

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Accent variability has posed a huge challenge to automatic speech recognition (ASR) modeling. Although one-hot accent vector based adaptation systems are commonly used, they require prior knowledge about the target accent and cannot handle unseen accents. Furthermore, simply concatenating accent embeddings does not make good use of accent knowledge, which has limited improvements. In this work, we aim to tackle these problems with a novel layer-wise adaptation structure injected into the E2E ASR model encoder. The adapter layer encodes an arbitrary accent in the accent space and assists the ASR model in recognizing accented speech. Given an utterance, the adaptation structure extracts the corresponding accent information and transforms the input acoustic feature into an accent-related feature through the linear combination of all accent bases. We further explore the injection position of the adaptation layer, the number of accent bases, and different types of accent bases to achieve better accent adaptation. Experimental results show that the proposed adaptation structure brings 12% and 10% relative word error rate (WER) reduction on the AESRC2020 accent dataset and the Librispeech dataset, respectively, compared to the baseline.

引用

页码：1274 / 1278

页数：5

共 50 条

[21] End-to-End Speech Recognition in Russian
Markovnikov, Nikita
Kipyatkova, Irina
Lyakso, Elena
[J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 377 - 386
[22] END-TO-END MULTIMODAL SPEECH RECOGNITION
Palaskar, Shruti
Sanabria, Ramon
Metze, Florian
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5774 - 5778
[23] Overview of end-to-end speech recognition
Wang, Song
Li, Guanyu
[J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
[24] Multichannel End-to-end Speech Recognition
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[25] End-to-end Accented Speech Recognition
Viglino, Thibault
Motlicek, Petr
Cernak, Milos
[J]. INTERSPEECH 2019, 2019, : 2140 - 2144
[26] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[27] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[28] JOINT MODELING OF ACCENTS AND ACOUSTICS FOR MULTI-ACCENT SPEECH RECOGNITION
Yang, Xuesong
Audhkhasi, Kartik
Rosenberg, Andrew
Thomas, Samuel
Ramabhadran, Bhuvana
Hasegawa-Johnson, Mark
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5989 - 5993
[29] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
Kim, Suyoun
Lane, Ian
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
[30] On multi-domain training and adaptation of end-to-end RNN acoustic models for distant speech recognition
Mirsamadi, Seyedmandad
Hansen, John H. L.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 404 - 408

← 1 2 3 4 5 →