End to end transformer-based contextual speech recognition based on pointer network

被引：0

作者：

Lin, Binghuai ^{[1
]}

Wang, Liyuan ^{[1
]}

机构：

[1] Tencent Technol Co Ltd, Smart Platform Prod Dept, Shenzhen, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech recognition; end-to-end; transformer; pointer network; contextual information;

D O I：

10.21437/Interspeech.2021-774

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Most spoken language assessment systems rely on the text features extracted from the automatic speech recognition (ASR) transcripts and thus depend heavily on the accuracy of the ASR systems. Automatic speech scoring tasks such as reading aloud and spontaneous speech are commonly provided with the prompts in advance to guide test takers' answers, which contain information that should be included in the answers (e.g., listening passage, and sample response). Utilizing these texts to improve ASR performance is of great importance for these tasks. In this paper, we develop an end-to-end (E2E) ASR system incorporating contextual information provided by prompts. Specifically, we add an extra prompt encoder to a transformer-based E2E ASR system. To fuse the probabilities of the ASR output and the prompts dynamically, we train a soft gate based on the pointer network with carefully constructed prompt training corpus. We experiment the proposed method with data collected from English speaking proficiency tests recorded by Chinese teenagers from 16 to 18 years old. The results show the improved performance of speech recognition with a nearly 50% drop in word error rate (WER) utilizing prompts. Furthermore, the proposed network performs well in rare word recognition such as locations and personal names.

引用

页码：2087 / 2091

页数：5

共 50 条

[21] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
Oh, Yoo Rhee
Park, Kiyoung
Park, Jeon Gue
[J]. ETRI JOURNAL, 2022, 44 (03) : 476 - 490
[22] Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
Karita, Shigeki
Soplin, Nelson Enrique Yalta
Watanabe, Shinji
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
[J]. INTERSPEECH 2019, 2019, : 1408 - 1412
[23] Semantic Mask for Transformer based End-to-End Speech Recognition
Wang, Chengyi
Wu, Yu
Du, Yujiao
Li, Jinyu
Liu, Shujie
Lu, Liang
Ren, Shuo
Ye, Guoli
Zhao, Sheng
Zhou, Ming
[J]. INTERSPEECH 2020, 2020, : 971 - 975
[24] TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction
Saeed, Numan
Sobirov, Ikboljon
Al Majzoub, Roba
Yaqub, Mohammad
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 319 - 329
[25] TOD-Net: An end-to-end transformer-based object detection network
Sirisha, Museboyina
Sudha, S. V.
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
[26] CIF-BASED COLLABORATIVE DECODING FOR END-TO-END CONTEXTUAL SPEECH RECOGNITION
Han, Minglun
Dong, Linhao
Zhou, Shiyu
Xu, Bo
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6528 - 6532
[27] End-to-End Speech Emotion Recognition Based on Neural Network
Zhu, Bing
Zhou, Wenkai
Wang, Yutian
Wang, Hui
Cai, Juan Juan
[J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
[28] Transformer-Based Turkish Automatic Speech Recognition
Tasar, Davut Emre
Koruyan, Kutan
Cilgin, Cihan
[J]. ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
[29] End-to-End Transformer-Based Models in Textual-Based NLP
Rahali, Abir
Akhloufi, Moulay A.
[J]. AI, 2023, 4 (01) : 54 - 110
[30] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
Yamini, Shaarada D.
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
Purini, Suresh
[J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100

← 1 2 3 4 5 →