PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION

被引:7
|
作者
Zhou, Wei [1 ,2 ]
Berger, Simon [1 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
基金
欧洲研究理事会;
关键词
phoneme; neural transducer; speech recognition; RNN-TRANSDUCER;
D O I
10.1109/ICASSP39728.2021.9413648
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling. We also present a simple, stable and efficient training procedure using frame-wise cross-entropy loss. A phonetic context size of one is shown to be sufficient for the best performance. A simplified scheduled sampling approach is applied for further improvement and different decoding approaches are briefly compared. The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora.
引用
收藏
页码:5644 / 5648
页数:5
相关论文
共 50 条
  • [41] A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition
    Lu, Liang
    Zhang, Xingxing
    Cho, Kyunghyun
    Renals, Steve
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3249 - 3253
  • [42] Phoneme-based Thai speech recognition using fuzzy system and neural network
    Cheirsilp, R
    Santiprabhob, P
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 393 - 397
  • [43] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
    Farooq, Muhammad Umar
    Adeeba, Farah
    Rauf, Sahar
    Hussain, Sarmad
    [J]. INTERSPEECH 2019, 2019, : 2978 - 2982
  • [44] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
    Yu, Dong
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
  • [45] Phoneme-based speech recognition via fuzzy neural networks modeling and learning
    Kasabov, NK
    Kozma, R
    Watts, MJ
    [J]. INFORMATION SCIENCES, 1998, 110 (1-2) : 61 - 79
  • [46] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
    Xu, Ming
    Hu, Ruimin
    [J]. 2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
  • [47] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
  • [48] PHONEME GROUPING FOR SPEECH RECOGNITION
    REDDY, DR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
  • [49] Improved Phoneme-Based Myoelectric Speech Recognition
    Zhou, Quan
    Jiang, Ning
    Englehart, Kevin
    Hudgins, Bernard
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
  • [50] Robust Phoneme Recognition Based on Biomimetic Speech Contours
    Carlin, Michael A.
    Patil, Kailash
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1346 - 1349