LOW-FREQUENCY CHARACTER CLUSTERING FOR END-TO-END ASR SYSTEM

被引:0
|
作者
Ito, Hitoshi [1 ]
Hagiwara, Aiko [1 ]
Ichiki, Manon [1 ]
Kobayakawa, Takeshi [1 ]
Mishima, Takeshi [1 ]
Sato, Shoei [1 ]
Kobayashi, Akio [2 ]
机构
[1] NHK Japan Broadcasting Corp, Shibuya, Japan
[2] Tsukuba Univ Technol, Tsukuba, Ibaraki, Japan
关键词
end-to-end ASR; acoustic modeling; connectionist temporal classification; long short-term memory;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speechr-ecognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.
引用
收藏
页码:187 / 191
页数:5
相关论文
共 50 条
  • [1] Extremely Low Footprint End-to-End ASR System for Smart Device
    Gao, Zhifu
    Yao, Yiwu
    Zhang, Shiliang
    Yang, Jun
    Lei, Ming
    McLoughlin, Ian
    [J]. INTERSPEECH 2021, 2021, : 4548 - 4552
  • [2] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [3] Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings
    Wiesner, Matthew
    Renduchintala, Adithya
    Watanabe, Shinji
    Liu, Chunxi
    Dehak, Najim
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2019, 2019, : 4375 - 4379
  • [4] An end-to-end continuous Kannada ASR system under uncontrolled environment
    G. Thimmaraja Yadava
    B. G. Nagaraja
    H. S. Jayanna
    [J]. Multimedia Tools and Applications, 2024, 83 : 7981 - 7994
  • [5] An end-to-end continuous Kannada ASR system under uncontrolled environment
    Yadava, G. Thimmaraja
    Nagaraja, B. G.
    Jayanna, H. S.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7981 - 7994
  • [6] Towards Lifelong Learning of End-to-end ASR
    Chang, Heng-Jui
    Lee, Hung-yi
    Lee, Lin-shan
    [J]. INTERSPEECH 2021, 2021, : 2551 - 2555
  • [7] Phonemic competition in end-to-end ASR models
    ten Bosch, Louis
    Bentum, Martijn
    Boves, Lou
    [J]. INTERSPEECH 2023, 2023, : 586 - 590
  • [8] End-to-End Topic Classification without ASR
    Dong, Zexian
    Liu, Jia
    Zhang, Wei-Qiang
    [J]. 2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [9] UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
    Sivaraman, Ganesh
    Casal, Ricardo
    Garland, Matt
    Khoury, Elie
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6987 - 6991
  • [10] Contextual Biasing for End-to-End Chinese ASR
    Zhang, Kai
    Zhang, Qiuxia
    Wang, Chung-Che
    Jang, Jyh-Shing Roger
    [J]. IEEE ACCESS, 2024, 12 : 92960 - 92975