Exploiting variable length segments with coarticulation effect in online speech recognition based on deep bidirectional recurrent neural network and context-sensitive segment

被引:0
|
作者
Song-Il Mun
Chol-Jin Han
Hye-Song Hong
机构
[1] KIM IL SUNG University,Faculty of Information Science
关键词
Online speech recognition; Deep Bidirectional Recurrent Neural Network (DBRNN); Connectionist Temporal Classification (CTC); Coarticulation effect; Variable length chunk;
D O I
暂无
中图分类号
学科分类号
摘要
Deep bidirectional recurrent network (DBRNN) is a powerful acoustic model that can capture the dynamics and coarticulation effect of speech signal. It can model the temporal sequences that depend on left and right contexts, whereas deep unidirectional recurrent neural network (or deep recurrent neural network) can model the temporal sequences that usually depend only on past information. When traditional DBRNNs are used, context-sensitive segments with carefully selected fixed length are exploited to balance recognition accuracy and latency for online speech recognition because the ASR decoder results in recognition latency, depending on the whole input sequence in each evaluation. On the other hand, acoustical realization of phoneme depends not only on the left-sided phoneme, but also on the right-sided phoneme, which should be considered in acoustic modeling for speech recognition. In this paper, we propose a DBRNN-based online speech recognition method that selects and exploits variable length chunks to take into account coarticulation effects appearing in speech production. In order to select variable length segments with the coarticulation effects, the vowel identification points predicted by a deep unidirectional recurrent neural network are used, and such variable length segments are used for training of DBRNN for online recognition. The deep unidirectional recurent neural network for predicting variable length segments is trained using the connectionist temporal classification (CTC) method. We show that the online recognizable DBRNN acoustic model constructed using variable length chunks with coarticulation effect in experiments on Korean speech recognition effectively limits recognition latency, resulting in performance comparable to traditional offline DBRNN, and provides improved performance than online recognition based on fixed-length context-sensitive chunks.
引用
收藏
页码:135 / 146
页数:11
相关论文
共 8 条
  • [1] Exploiting variable length segments with coarticulation effect in online speech recognition based on deep bidirectional recurrent neural network and context-sensitive segment
    Mun, Song-Il
    Han, Chol-Jin
    Hong, Hye-Song
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 135 - 146
  • [2] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163
  • [3] Speech enhancement method using context-sensitive attention mechanism and recurrent neural network
    Lan, Tian
    Hui, Guoqiang
    Li, Meng
    Lü, Yilan
    Liu, Qiao
    [J]. Shengxue Xuebao/Acta Acustica, 2020, 45 (06): : 897 - 905
  • [4] ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM
    Lee, Kyungmin
    Park, Chiyoun
    Kim, Namhoon
    Lee, Jaewon
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5904 - 5908
  • [5] Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings
    Sazhok, Mykola
    Poltieva, Anna
    Robeiko, Valentyna
    Seliukh, Ruslan
    Fedoryn, Dmytro
    [J]. COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [6] Effective Human Motor Imagery Recognition via Segment Pool Based on One-Dimensional Convolutional Neural Network with Bidirectional Recurrent Attention Unit Network
    Hu, Huawen
    Yue, Chenxi
    Shi, Enze
    Yu, Sigang
    Kang, Yanqing
    Wu, Jinru
    Wang, Jiaqi
    Zhang, Shu
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (16):
  • [7] Complex active sonar targets recognition using variable length deep convolutional neural network evolved by biogeography-based optimizer
    Khishe, Mohammad
    Mohammadi, Mokhtar
    Mohammed, Adil Hussein
    [J]. WAVES IN RANDOM AND COMPLEX MEDIA, 2022,
  • [8] Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gyu
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (12):