End to end transformer-based contextual speech recognition based on pointer network

被引:0
|
作者
Lin, Binghuai [1 ]
Wang, Liyuan [1 ]
机构
[1] Tencent Technol Co Ltd, Smart Platform Prod Dept, Shenzhen, Peoples R China
来源
关键词
speech recognition; end-to-end; transformer; pointer network; contextual information;
D O I
10.21437/Interspeech.2021-774
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Most spoken language assessment systems rely on the text features extracted from the automatic speech recognition (ASR) transcripts and thus depend heavily on the accuracy of the ASR systems. Automatic speech scoring tasks such as reading aloud and spontaneous speech are commonly provided with the prompts in advance to guide test takers' answers, which contain information that should be included in the answers (e.g., listening passage, and sample response). Utilizing these texts to improve ASR performance is of great importance for these tasks. In this paper, we develop an end-to-end (E2E) ASR system incorporating contextual information provided by prompts. Specifically, we add an extra prompt encoder to a transformer-based E2E ASR system. To fuse the probabilities of the ASR output and the prompts dynamically, we train a soft gate based on the pointer network with carefully constructed prompt training corpus. We experiment the proposed method with data collected from English speaking proficiency tests recorded by Chinese teenagers from 16 to 18 years old. The results show the improved performance of speech recognition with a nearly 50% drop in word error rate (WER) utilizing prompts. Furthermore, the proposed network performs well in rare word recognition such as locations and personal names.
引用
收藏
页码:2087 / 2091
页数:5
相关论文
共 50 条
  • [21] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    [J]. ETRI JOURNAL, 2022, 44 (03) : 476 - 490
  • [22] Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
    Karita, Shigeki
    Soplin, Nelson Enrique Yalta
    Watanabe, Shinji
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. INTERSPEECH 2019, 2019, : 1408 - 1412
  • [23] Semantic Mask for Transformer based End-to-End Speech Recognition
    Wang, Chengyi
    Wu, Yu
    Du, Yujiao
    Li, Jinyu
    Liu, Shujie
    Lu, Liang
    Ren, Shuo
    Ye, Guoli
    Zhao, Sheng
    Zhou, Ming
    [J]. INTERSPEECH 2020, 2020, : 971 - 975
  • [24] TMSS: An End-to-End Transformer-Based Multimodal Network for Segmentation and Survival Prediction
    Saeed, Numan
    Sobirov, Ikboljon
    Al Majzoub, Roba
    Yaqub, Mohammad
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 319 - 329
  • [25] TOD-Net: An end-to-end transformer-based object detection network
    Sirisha, Museboyina
    Sudha, S. V.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [26] CIF-BASED COLLABORATIVE DECODING FOR END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Han, Minglun
    Dong, Linhao
    Zhou, Shiyu
    Xu, Bo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6528 - 6532
  • [27] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [28] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    [J]. ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [29] End-to-End Transformer-Based Models in Textual-Based NLP
    Rahali, Abir
    Akhloufi, Moulay A.
    [J]. AI, 2023, 4 (01) : 54 - 110
  • [30] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
    Yamini, Shaarada D.
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    Purini, Suresh
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100