Time Delay Recurrent Neural Network for Speech Recognition

被引:5
|
作者
Liu, Boji [1 ]
Zhang, Weibin [1 ]
Xu, Xiangming [1 ]
Chen, Dongpeng [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] VoiceAI Technol, Shenzhen, Peoples R China
关键词
D O I
10.1088/1742-6596/1229/1/012078
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In Automatic Speech Recognition(ASR), Time Delay Neural Network (TDNN) has been proven to be an efficient network structure for its strong ability in context modeling. In addition, as a feed-forward neural architecture, it is faster to train TDNN, compared with recurrent neural networks such as Long Short-Term Memory (LSTM). However, different from recurrent neural networks, the context in TDNN is carefully designed and is limited. Although stacking Long Short-Term Memory (LSTM) together with TDNN in order to extend the context information have been proven to be useful, it is too complex and is hard to train. In this paper, we focus on directly extending the context modeling capability of TDNNs by adding recurrent connections. Several new network architectures were investigated. The results on the Switchboard show that the best model significantly outperforms the base line TDNN system and is comparable with TDNN-LSTM architecture. In addition, the training process is much simpler than that of TDNN-LSTM.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Gated Time Delay Neural Network for Speech Recognition
    Chen, Kaibin
    Zhang, Weibin
    Chen, Dongpeng
    Huang, Xiaorong
    Liu, Boji
    Xu, Xiangmin
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019), 2019, 1229
  • [2] Recurrent neural network with backpropagation through time for speech recognition
    Ahmad, AM
    Ismail, S
    Samaon, DF
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 98 - 102
  • [3] Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
    Srun, Nalin
    Leang, Sotheara
    Thu, Ye Kyaw
    Sam, Sethserey
    [J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [4] Stochastic Recurrent Neural Network for Speech Recognition
    Chien, Jen-Tzung
    Shen, Chen
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1313 - 1317
  • [5] Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network
    Wu, Fei
    Garcia, Leibny Paola
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2019, 2019, : 1 - 5
  • [6] DEEP RECURRENT REGULARIZATION NEURAL NETWORK FOR SPEECH RECOGNITION
    Chien, Jen-Tzung
    Lu, Tsai-Wei
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4560 - 4564
  • [7] Implementation of an autoassociative Recurrent Neural Network for speech recognition
    Cocchiglia, A
    Paplinski, A
    [J]. IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 245 - 248
  • [8] Automatic Speech Recognition trained with Convolutional Neural Network and predicted with Recurrent Neural Network
    Soundarya, M.
    Karthikeyan, P. R.
    Thangarasu, Gunasekar
    [J]. 2023 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENERGY SYSTEMS, ICEES, 2023, : 41 - 45
  • [9] Dysarthric Speech Recognition using Time-delay Neural Network based Denoising Autoencoder
    Bhat, Chitralekha
    Das, Biswajit
    Vachhani, Bhavik
    Kopparapu, Sunil Kumar
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 451 - 455
  • [10] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
    Gong, Caixia
    Li, Xiangang
    Wu, Xihong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463