Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition

被引：24

作者：

Chen, X. ^{[1
]}

Ragni, A. ^{[1
]}

Liu, X. ^{[2
]}

Gales, M. J. F. ^{[1
]}

机构：

[1] Univ Cambridge, Engn Dept, Cambridge, England

[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

language model; bidirectional recurrent neural network; speech recognition; interpolation;

D O I：

10.21437/Interapeech.2017-513

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent neural network language models (RNNLMs) are powerful language modeling techniques. Significant performance improvements have been reported in a range of tasks including speech recognition compared to n-gram language models. Conventional n-gram and neural network language models are trained to predict the probability of the next word given its preceding context history. In contrast, bidirectional recurrent neural network based language models consider the context from future words as well. This complicates the inference process. but has theoretical benefits for tasks such as speech recognition as additional context information can be used. However to date, very limited or no gains in speech recognition performance have been reported with this form of model. This paper examines the issues of training bidirectional recurrent neural network language models (bi-RNNLMs) for speech recognition. A bi-RNNLM probability smoothing technique is proposed, that addresses the very sharp posteriors that are often observed in these models. The performance of the bi-RNNLMs is evaluated on three speech recognition tasks: broadcast news: meeting transcription (AMI); and low-resource systems (Babel data). On all tasks gains are observed by applying the smoothing technique to the bi-RNNLM. In addition consistent performance gains can be obtained by combining bi-RNNLMs with n-gram and uni-directional RNNLMs.

引用

页码：269 / 273

页数：5

共 50 条

[1] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Arisoy, Ebru
Sethy, Abhinav
Ramabhadran, Bhuvana
Chen, Stanley
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
[2] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Lam, Max W. Y.
Chen, Xie
Hu, Shoukang
Yu, Jianwei
Liu, Xunying
Meng, Helen
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239
[3] Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Sakauchi, Sumitaka
Ito, Akinori
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2557 - 2567
[4] Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition
Chen, Xie
Liu, Xunying
Wang, Yongqiang
Gales, Mark J. F.
Woodland, Philip C.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2146 - 2157
[5] LIMITED-MEMORY BFGS OPTIMIZATION OF RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Liu, Xunying
Liu, Shansong
Sha, Jinze
Yu, Jianwei
Xu, Zhiyuan
Chen, Xie
Meng, Helen
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6114 - 6118
[6] Comparison of Various Neural Network Language Models in Speech Recognition
Zuo, Lingyun
Liu, Jian
Wan, Xin
[J]. 2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2016, : 894 - 898
[7] A Speech Recognition System for Bengali Language using Recurrent Neural Network
Islam, Jahirul
Mubassira, Masiath
Islam, Md. Rakibul
Das, Amit Kumar
[J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 73 - 76
[8] Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Li, Ke
Xu, Hainan
Wang, Yiming
Povey, Daniel
Khudanpur, Sanjeev
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3373 - 3377
[9] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
Gong, Caixia
Li, Xiangang
Wu, Xihong
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
[10] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
Lecorve, Gwenole
Motlicek, Petr
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669

← 1 2 3 4 5 →