Lower Frame Rate Neural Network Acoustic Models

被引:69
|
作者
Pundak, Golan [1 ]
Sainath, Tara N. [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
关键词
speech recognition; recurrent neural networks; connectionist temporal classification;
D O I
10.21437/Interspeech.2016-275
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently neural network acoustic models trained with Connectionist Temporal Classification (CTC) were proposed as an alternative approach to conventional cross-entropy trained neural network acoustic models which output frame-level decisions every 10ms [1]. As opposed to conventional models, CTC learns an alignment jointly with the acoustic model, and outputs a blank symbol in addition to the regular acoustic state units. This allows the CTC model to run with a lower frame rate, outputting decisions every 30ms rather than 10ms as in conventional models, thus improving overall system speed. In this work, we explore how conventional models behave with lower frame rates. On a large vocabulary Voice Search task, we will show that with conventional models, we can slow the frame rate to 40ms while improving WER by 3% relative over a CTC-based model.
引用
收藏
页码:22 / 26
页数:5
相关论文
共 50 条
  • [31] Mixture cure rate models with neural network estimated nonparametric components
    Yujing Xie
    Zhangsheng Yu
    Computational Statistics, 2021, 36 : 2467 - 2489
  • [32] Adaptive Frame Rate Optimization Based on Particle Swarm and Neural Network for Industrial Video Stream
    Zhang, Xiaoling
    Li, Menghao
    Mei, Ke
    Ding, Lu
    2019 24TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2019, : 1111 - 1118
  • [33] NEURAL NETWORK MODELS
    FORREST, BM
    ROWETH, D
    STROUD, N
    WALLACE, DJ
    WILSON, GV
    PARALLEL COMPUTING, 1988, 8 (1-3) : 71 - 83
  • [34] Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages
    Alumae, Tanel
    Tsakalidis, Stavros
    Schwartz, Richard
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3883 - 3887
  • [35] Beluga whale acoustic signal classification using deep learning neural network models
    Zhong, Ming
    Castellote, Manuel
    Dodhia, Rahul
    Ferres, Juan Lavista
    Keogh, Mandy
    Brewer, Arial
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (03): : 1834 - 1841
  • [36] Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models
    Bell, Peter
    Swietojanski, Pawel
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 238 - 247
  • [37] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
    Swietojanski, Pawel
    Renals, Steve
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176
  • [38] Environmentally robust ASR front-end for deep neural network acoustic models
    Yoshioka, T.
    Gales, M. J. F.
    COMPUTER SPEECH AND LANGUAGE, 2015, 31 (01): : 65 - 86
  • [39] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Yoshioka, Takuya
    Tran, Dung
    Nakatani, Tomohiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
  • [40] HOW NEURAL NETWORK FEATURES AND DEPTH MODIFY STATISTICAL PROPERTIES OF HMM ACOUSTIC MODELS
    Ravuri, Suman
    Wegmann, Steven
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5080 - 5084