Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network

被引:0
|
作者
Ao, Junyi [1 ]
Ko, Tom [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
关键词
speech recognition; N-gram; coda; attention-based; end-to-end model;
D O I
10.1109/ISCSLP49672.2021.9362055
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In attention-based end-to-end ASR, the intrinsic LM is modeled by an RNN and it forms the major part of the decoder. Comparing with external LMs, the intrinsic LM is considered as modest as it is only trained with the transcription associated with the speech data. Although it is a common practise to interpolate the scores of the end-to-end model and the external LM, the need of an external model hurts the novelty of end-to-end. Therefore, researchers are investigating different ways of improving the intrinsic LM of the end-to-end model. By observing the fact that N-gram LMs and RNN LMs can complement each other, we would like to investigate the effect of implementing an N-gram neural network inside the end-to-end model. In this paper, we examine two implementations of N-gram neural network in the context of attention-based end-to-end ASR. We find that both implementations improve the baseline and CBOW (Continuous Bag-of-Words) performs slightly better. We further propose a way to minimize the size of the N-gram component by utilizing the coda information of the modeling units. Experiments on LibriSpeech dataset show that our proposed method achieves obvious improvement with only a slight increase in model parameters.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Attention-based neural network for end-to-end music separation
    Wang, Jing
    Liu, Hanyue
    Ying, Haorong
    Qiu, Chuhan
    Li, Jingxin
    Anwar, Muhammad Shahid
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 355 - 363
  • [2] IMPROVING ATTENTION-BASED END-TO-END ASR SYSTEMS WITH SEQUENCE-BASED LOSS FUNCTIONS
    Cui, Jia
    Weng, Chao
    Wang, Guangsen
    Wang, Jun
    Wang, Peidong
    Yu, Chengzhu
    Su, Dan
    Yu, Dong
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 353 - 360
  • [3] ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture
    Cheng, Gaofeng
    Miao, Haoran
    Yang, Runyan
    Deng, Keqi
    Yan, Yonghong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1360 - 1373
  • [4] Attention-based end-to-end image defogging network
    Yang, Yan
    Zhang, Chen
    Jiang, Peipei
    Yue, Hui
    [J]. ELECTRONICS LETTERS, 2020, 56 (15) : 759 - +
  • [5] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [6] Improving Mandarin End-to-End Speech Recognition With Word N-Gram Language Model
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zou, Yuexian
    Yu, Dong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 812 - 816
  • [7] An End-to-End Attention-Based Neural Model for Complementary Clothing Matching
    Liu, Jinhuan
    Song, Xuemeng
    Nie, Liqiang
    Gan, Tian
    Ma, Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (04)
  • [8] A-VLAD: An End-to-End Attention-Based Neural Network for Writer Identification in Historical Documents
    Ngo, Trung Tan
    Nguyen, Hung Tuan
    Nakagawa, Masaki
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 396 - 409
  • [9] EXPLORING END-TO-END ATTENTION-BASED NEURAL NETWORKS FOR NATIVE LANGUAGE IDENTIFICATION
    Ubale, Rutuja
    Qian, Yao
    Evanini, Keelan
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 84 - 91
  • [10] End-to-end Language Identification using Attention-based Recurrent Neural Networks
    Geng, Wang
    Wang, Wenfu
    Zhao, Yuanyuan
    Cai, Xinyuan
    Xu, Bo
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2944 - 2948