Lower Frame Rate Neural Network Acoustic Models

被引：69

作者：

Pundak, Golan ^{[1
]}

Sainath, Tara N. ^{[1
]}

机构：

[1] Google Inc, New York, NY 10011 USA

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

speech recognition; recurrent neural networks; connectionist temporal classification;

D O I：

10.21437/Interspeech.2016-275

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently neural network acoustic models trained with Connectionist Temporal Classification (CTC) were proposed as an alternative approach to conventional cross-entropy trained neural network acoustic models which output frame-level decisions every 10ms [1]. As opposed to conventional models, CTC learns an alignment jointly with the acoustic model, and outputs a blank symbol in addition to the regular acoustic state units. This allows the CTC model to run with a lower frame rate, outputting decisions every 30ms rather than 10ms as in conventional models, thus improving overall system speed. In this work, we explore how conventional models behave with lower frame rates. On a large vocabulary Voice Search task, we will show that with conventional models, we can slow the frame rate to 40ms while improving WER by 3% relative over a CTC-based model.

引用

页码：22 / 26

页数：5

共 50 条

[31] Mixture cure rate models with neural network estimated nonparametric components
Yujing Xie
Zhangsheng Yu
Computational Statistics, 2021, 36 : 2467 - 2489
[32] Adaptive Frame Rate Optimization Based on Particle Swarm and Neural Network for Industrial Video Stream
Zhang, Xiaoling
Li, Menghao
Mei, Ke
Ding, Lu
2019 24TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2019, : 1111 - 1118
[33] NEURAL NETWORK MODELS
FORREST, BM
ROWETH, D
STROUD, N
WALLACE, DJ
WILSON, GV
PARALLEL COMPUTING, 1988, 8 (1-3) : 71 - 83
[34] Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages
Alumae, Tanel
Tsakalidis, Stavros
Schwartz, Richard
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3883 - 3887
[35] Beluga whale acoustic signal classification using deep learning neural network models
Zhong, Ming
Castellote, Manuel
Dodhia, Rahul
Ferres, Juan Lavista
Keogh, Mandy
Brewer, Arial
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (03): : 1834 - 1841
[36] Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models
Bell, Peter
Swietojanski, Pawel
Renals, Steve
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 238 - 247
[37] LEARNING HIDDEN UNIT CONTRIBUTIONS FOR UNSUPERVISED SPEAKER ADAPTATION OF NEURAL NETWORK ACOUSTIC MODELS
Swietojanski, Pawel
Renals, Steve
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 171 - 176
[38] Environmentally robust ASR front-end for deep neural network acoustic models
Yoshioka, T.
Gales, M. J. F.
COMPUTER SPEECH AND LANGUAGE, 2015, 31 (01): : 65 - 86
[39] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
Delcroix, Marc
Kinoshita, Keisuke
Ogawa, Atsunori
Yoshioka, Takuya
Tran, Dung
Nakatani, Tomohiro
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
[40] HOW NEURAL NETWORK FEATURES AND DEPTH MODIFY STATISTICAL PROPERTIES OF HMM ACOUSTIC MODELS
Ravuri, Suman
Wegmann, Steven
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5080 - 5084

← 1 2 3 4 5 →