LOW-FREQUENCY CHARACTER CLUSTERING FOR END-TO-END ASR SYSTEM

被引:0
|
作者
Ito, Hitoshi [1 ]
Hagiwara, Aiko [1 ]
Ichiki, Manon [1 ]
Kobayakawa, Takeshi [1 ]
Mishima, Takeshi [1 ]
Sato, Shoei [1 ]
Kobayashi, Akio [2 ]
机构
[1] NHK Japan Broadcasting Corp, Shibuya, Japan
[2] Tsukuba Univ Technol, Tsukuba, Ibaraki, Japan
关键词
end-to-end ASR; acoustic modeling; connectionist temporal classification; long short-term memory;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speechr-ecognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.
引用
收藏
页码:187 / 191
页数:5
相关论文
共 50 条
  • [21] END-TO-END CODE-SWITCHING ASR FOR LOW-RESOURCED LANGUAGE PAIRS
    Yue, Xianghu
    Lee, Grandee
    Yilmaz, Emre
    Deng, Fang
    Li, Haizhou
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 972 - 979
  • [22] Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets
    Bekarystankyzy, Akbayan
    Mamyrbayev, Orken
    Mendes, Mateus
    Fazylzhanova, Anar
    Assam, Muhammad
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [23] END-TO-END DEEP MULTIMODAL CLUSTERING
    Zhang, Xianchao
    Mu, Jie
    Zong, Linlin
    Yang, Xiaochun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [24] Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
    Karakasidis, Georgios
    Grosz, Tamas
    Kurimo, Mikko
    [J]. INTERSPEECH 2022, 2022, : 66 - 70
  • [25] BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS
    Deng, Liuhui
    Hsiao, Roger
    Ghoshal, Arnab
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6417 - 6421
  • [26] Comparison and analysis of new curriculum criteria for end-to-end ASR
    Karakasidis, Georgios
    Kurimo, Mikko
    Bell, Peter
    Grosz, Tamas
    [J]. SPEECH COMMUNICATION, 2024, 163
  • [27] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    [J]. INTERSPEECH 2019, 2019, : 3490 - 3494
  • [28] Iterative Compression of End-to-End ASR Model using AutoML
    Mehrotra, Abhinav
    Dudziak, Lukasz
    Yeo, Jinsu
    Lee, Young-yoon
    Vipperla, Ravichander
    Abdelfattah, Mohamed S.
    Bhattacharya, Sourav
    Ishtiaq, Samin
    Ramos, Alberto Gil C. P.
    Lee, SangJeong
    Kim, Daehyun
    Lane, Nicholas D.
    [J]. INTERSPEECH 2020, 2020, : 3361 - 3365
  • [29] Data Augmentation Using CycleGAN for End-to-End Children ASR
    Singh, Dipesh K.
    Amin, Preet P.
    Sailor, Hardik B.
    Patil, Hemant A.
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 511 - 515
  • [30] Auxiliary feature based adaptation of end-to-end ASR systems
    Delcroix, Marc
    Watanabe, Shinji
    Ogawa, Atsunori
    Karita, Shigeki
    Nakatani, Tomohiro
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2444 - 2448