PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION

被引：7

作者：

Zhou, Wei ^{[1
,2
]}

Berger, Simon ^{[1
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

欧洲研究理事会;

关键词：

phoneme; neural transducer; speech recognition; RNN-TRANSDUCER;

D O I：

10.1109/ICASSP39728.2021.9413648

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To join the advantages of classical and end-to-end approaches for speech recognition, we present a simple, novel and competitive approach for phoneme-based neural transducer modeling. Different alignment label topologies are compared and word-end-based phoneme label augmentation is proposed to improve performance. Utilizing the local dependency of phonemes, we adopt a simplified neural network structure and a straightforward integration with the external word-level language model to preserve the consistency of seq-to-seq modeling. We also present a simple, stable and efficient training procedure using frame-wise cross-entropy loss. A phonetic context size of one is shown to be sufficient for the best performance. A simplified scheduled sampling approach is applied for further improvement and different decoding approaches are briefly compared. The overall performance of our best model is comparable to state-of-the-art (SOTA) results for the TED-LIUM Release 2 and Switchboard corpora.

引用

页码：5644 / 5648

页数：5

共 50 条

[41] A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition
Lu, Liang
Zhang, Xingxing
Cho, Kyunghyun
Renals, Steve
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3249 - 3253
[42] Phoneme-based Thai speech recognition using fuzzy system and neural network
Cheirsilp, R
Santiprabhob, P
[J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 393 - 397
[43] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Farooq, Muhammad Umar
Adeeba, Farah
Rauf, Sahar
Hussain, Sarmad
[J]. INTERSPEECH 2019, 2019, : 2978 - 2982
[44] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
Yu, Dong
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
[45] Phoneme-based speech recognition via fuzzy neural networks modeling and learning
Kasabov, NK
Kozma, R
Watts, MJ
[J]. INFORMATION SCIENCES, 1998, 110 (1-2) : 61 - 79
[46] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
Xu, Ming
Hu, Ruimin
[J]. 2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
[47] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
Li, Xiangang
Wu, Xihong
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
[48] PHONEME GROUPING FOR SPEECH RECOGNITION
REDDY, DR
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
[49] Improved Phoneme-Based Myoelectric Speech Recognition
Zhou, Quan
Jiang, Ning
Englehart, Kevin
Hudgins, Bernard
[J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
[50] Robust Phoneme Recognition Based on Biomimetic Speech Contours
Carlin, Michael A.
Patil, Kailash
Nemala, Sridhar Krishna
Elhilali, Mounya
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1346 - 1349

← 1 2 3 4 5 →