Time Delay Recurrent Neural Network for Speech Recognition

被引：5

作者：

Liu, Boji ^{[1
]}

Zhang, Weibin ^{[1
]}

Xu, Xiangming ^{[1
]}

Chen, Dongpeng ^{[2
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

[2] VoiceAI Technol, Shenzhen, Peoples R China

来源：

2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019) | 2019年 / 1229卷

关键词：

D O I：

10.1088/1742-6596/1229/1/012078

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In Automatic Speech Recognition(ASR), Time Delay Neural Network (TDNN) has been proven to be an efficient network structure for its strong ability in context modeling. In addition, as a feed-forward neural architecture, it is faster to train TDNN, compared with recurrent neural networks such as Long Short-Term Memory (LSTM). However, different from recurrent neural networks, the context in TDNN is carefully designed and is limited. Although stacking Long Short-Term Memory (LSTM) together with TDNN in order to extend the context information have been proven to be useful, it is too complex and is hard to train. In this paper, we focus on directly extending the context modeling capability of TDNNs by adding recurrent connections. Several new network architectures were investigated. The results on the Switchboard show that the best model significantly outperforms the base line TDNN system and is comparable with TDNN-LSTM architecture. In addition, the training process is much simpler than that of TDNN-LSTM.

引用

页数：8

共 50 条

[1] Gated Time Delay Neural Network for Speech Recognition
Chen, Kaibin
Zhang, Weibin
Chen, Dongpeng
Huang, Xiaorong
Liu, Boji
Xu, Xiangmin
[J]. 2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019), 2019, 1229
[2] Recurrent neural network with backpropagation through time for speech recognition
Ahmad, AM
Ismail, S
Samaon, DF
[J]. IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 98 - 102
[3] Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
Srun, Nalin
Leang, Sotheara
Thu, Ye Kyaw
Sam, Sethserey
[J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
[4] Stochastic Recurrent Neural Network for Speech Recognition
Chien, Jen-Tzung
Shen, Chen
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1313 - 1317
[5] Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network
Wu, Fei
Garcia, Leibny Paola
Povey, Daniel
Khudanpur, Sanjeev
[J]. INTERSPEECH 2019, 2019, : 1 - 5
[6] DEEP RECURRENT REGULARIZATION NEURAL NETWORK FOR SPEECH RECOGNITION
Chien, Jen-Tzung
Lu, Tsai-Wei
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4560 - 4564
[7] Implementation of an autoassociative Recurrent Neural Network for speech recognition
Cocchiglia, A
Paplinski, A
[J]. IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 245 - 248
[8] Automatic Speech Recognition trained with Convolutional Neural Network and predicted with Recurrent Neural Network
Soundarya, M.
Karthikeyan, P. R.
Thangarasu, Gunasekar
[J]. 2023 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENERGY SYSTEMS, ICEES, 2023, : 41 - 45
[9] Dysarthric Speech Recognition using Time-delay Neural Network based Denoising Autoencoder
Bhat, Chitralekha
Das, Biswajit
Vachhani, Bhavik
Kopparapu, Sunil Kumar
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 451 - 455
[10] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
Gong, Caixia
Li, Xiangang
Wu, Xihong
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463

← 1 2 3 4 5 →