Lower Frame Rate Neural Network Acoustic Models

被引:69
|
作者
Pundak, Golan [1 ]
Sainath, Tara N. [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
关键词
speech recognition; recurrent neural networks; connectionist temporal classification;
D O I
10.21437/Interspeech.2016-275
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently neural network acoustic models trained with Connectionist Temporal Classification (CTC) were proposed as an alternative approach to conventional cross-entropy trained neural network acoustic models which output frame-level decisions every 10ms [1]. As opposed to conventional models, CTC learns an alignment jointly with the acoustic model, and outputs a blank symbol in addition to the regular acoustic state units. This allows the CTC model to run with a lower frame rate, outputting decisions every 30ms rather than 10ms as in conventional models, thus improving overall system speed. In this work, we explore how conventional models behave with lower frame rates. On a large vocabulary Voice Search task, we will show that with conventional models, we can slow the frame rate to 40ms while improving WER by 3% relative over a CTC-based model.
引用
收藏
页码:22 / 26
页数:5
相关论文
共 50 条
  • [21] Spatio-Temporal Convolutional Neural Network for Frame Rate Up-Conversion
    Tanaka, Yusuke
    Omori, Toshiaki
    2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 35 - 39
  • [22] Complementary tasks for context-dependent deep neural network acoustic models
    Bell, Peter
    Renals, Steve
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3610 - 3614
  • [23] STANDALONE TRAINING OF CONTEXT-DEPENDENT DEEP NEURAL NETWORK ACOUSTIC MODELS
    Zhang, C.
    Woodland, P. C.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
    Kim, Chanwoo
    Variani, Ehsan
    Narayanan, Arun
    Bacchiani, Michiel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3028 - 3032
  • [25] HOW TRANSFERABLE ARE FEATURES IN CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODELS ACROSS LANGUAGES?
    Thompson, Jessica A. F.
    Schoenwiesner, Marc
    Bengio, Yoshua
    Willett, Daniel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2827 - 2831
  • [26] IMPROVING DEEP NEURAL NETWORK ACOUSTIC MODELS USING GENERALIZED MAXOUT NETWORKS
    Zhang, Xiaohui
    Trmal, Jan
    Povey, Daniel
    Khudanpur, Sanjeev
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [27] Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
    Saon, George
    Soltau, Hagen
    Nahamoo, David
    Picheny, Michael
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 55 - 59
  • [28] Rate-Accuracy Optimization of Deep Convolutional Neural Network Models
    Filini, Alessandro
    Ascenso, Joao
    Leonardi, Riccardo
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 91 - 98
  • [29] Mixture cure rate models with neural network estimated nonparametric components
    Xie, Yujing
    Yu, Zhangsheng
    COMPUTATIONAL STATISTICS, 2021, 36 (04) : 2467 - 2489
  • [30] Neural network modeling of fermentation processes: Specific kinetic rate models
    Koprinkova, P
    Petrova, M
    Patarinska, T
    Bliznakova, M
    CYBERNETICS AND SYSTEMS, 1998, 29 (03) : 303 - 317