ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH

被引：0

作者：

Shan, Changhao ^{[1
,2
]}

Zhang, Junbo ^{[2
]}

Wang, Yujun ^{[2
]}

Xie, Lei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Shaanxi Prov Key Lab Speech & Image Informat Proc, Xian, Shaanxi, Peoples R China

[2] Xiaomi Inc, Beijing, Peoples R China

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

automatic speech recognition; end-to-end speech recognition; attention model; voice search; NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, there has been a growing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on a voice search task. Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model. In this paper, we use character embedding to deal with the large vocabulary. Several tricks are used for effective model training, including L2 regularization, Gaussian weight noise and frame skipping. We compare two attention mechanisms and use attention smoothing to cover long context in the attention model. Taken together, these tricks allow us to finally achieve a character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on the MiTV voice search dataset. While together with a trigram language model, CER and SER reach 2.81% and 5.77%, respectively.

引用

页码：4764 / 4768

页数：5

共 50 条

[1] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
Bandanau, Dzmitry
Chorowski, Jan
Serdyuk, Dmitriy
Brakel, Philemon
Bengio, Yoshua
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
[2] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Meng, Zhong
Gaur, Yashesh
Li, Jinyu
Gong, Yifan
[J]. INTERSPEECH 2019, 2019, : 241 - 245
[3] CHARACTER-AWARE ATTENTION-BASED END-TO-END SPEECH RECOGNITION
Meng, Zhong
Gaur, Yashesh
Li, Jinyu
Gong, Yifan
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 949 - 955
[4] AN ANALYSIS OF DECODING FOR ATTENTION-BASED END-TO-END MANDARIN SPEECH RECOGNITION
Jiang, Dongwei
Zou, Wei
Zhao, Shuaijiang
Yang, Guilin
Li, Xiangang
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 384 - 388
[5] EXPLICIT ALIGNMENT OF TEXT AND SPEECH ENCODINGS FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION
Drexler, Jennifer
Glass, James
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 913 - 919
[6] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
Yeh, Ching-Feng
Wang, Yongqiang
Shi, Yangyang
Wu, Chunyang
Zhang, Frank
Chan, Julian
Seltzer, Michael L.
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
[7] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
Wang, Xiaofei
Li, Ruizhi
Mallidi, Sri Harish
Hori, Takaaki
Watanabe, Shinji
Hermansky, Hynek
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
[8] Attention based end to end Speech Recognition for Voice Search in Hindi and English
Joshi, Raviraj
Kannan, Venkateshan
[J]. FIRE 2021: PROCEEDINGS OF THE 13TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION, 2021, : 107 - 113
[9] Towards Efficiently Learning Monotonic Alignments for Attention-Based End-to-End Speech Recognition
Miao, Chenfeng
Zou, Kun
Zhuang, Ziyang
Wei, Tao
Ma, Jun
Wang, Shaojun
Xiao, Jing
[J]. INTERSPEECH 2022, 2022, : 1051 - 1055
[10] Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement
Yang, Da-Hee
Chang, Joon-Hyuk
[J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (03) : 202 - 210

← 1 2 3 4 5 →