DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR

被引：0

作者：

Sainath, Tara N. ^{[1
]}

Mohamed, Abdel-rahman

Kingsbury, Brian ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

机构：

[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

Neural Networks; Speech Recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, CNNs are a more effective model for speech compared to Deep Neural Networks (DNNs). In this paper, we explore applying CNNs to large vocabulary speech tasks. First, we determine the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks. Specifically, we focus on how many convolutional layers are needed, what is the optimal number of hidden units, what is the best pooling strategy, and the best input feature type for CNNs. We then explore the behavior of neural network features extracted from CNNs on a variety of LVCSR tasks, comparing CNNs to DNNs and GMMs. We find that CNNs offer between a 13-30% relative improvement over GMMs, and a 4-12% relative improvement over DNNs, on a 400-hr Broadcast News and 300-hr Switchboard task.

引用

页码：8614 / 8618

页数：5

共 50 条

[1] IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Sainath, Tara N.
Kingsbury, Brian
Mohamed, Abdel-rahman
Dahl, George E.
Saon, George
Soltau, Hagen
Beran, Tomas
Aravkin, Aleksandr Y.
Ramabhadran, Bhuvana
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 315 - 320
[2] Very Deep Convolutional Neural Networks for LVCSR
Bi, Mengxiao
Qian, Yanmin
Yu, Kai
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3259 - 3263
[3] Advances in Very Deep Convolutional Neural Networks for LVCSR
Sercu, Tom
Goel, Vaibhava
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3429 - 3433
[4] VERY DEEP MULTILINGUAL CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Sercu, Tom
Puhrsch, Christian
Kingsbury, Brian
LeCun, Yann
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4955 - 4959
[5] Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks
Sainath, Tara N.
Peddinti, Vijayaditya
Kingsbury, Brian
Fousek, Petr
Ramabhadran, Bhuvana
Nahamoo, David
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 900 - 904
[6] Investigation into the use of deep neural networks for LVCSR of Czech
Mateju, Lukas
Cerva, Petr
Zdansky, Jindrich
[J]. 2015 IEEE INTERNATIONAL WORKSHOP OF ELECTRONICS, CONTROL, MEASUREMENT, SIGNALS AND THEIR APPLICATION TO MECHATRONICS (ECMSM), 2015,
[7] Convolutional Neural Networks for Acoustic Modeling of Raw Time Signal in LVCSR
Golik, Pavel
Tueske, Zoltan
Schlueter, Ralf
Ney, Hermann
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 26 - 30
[8] Deep Convolutional Neural Networks
Gonzalez, Rafael C.
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (06) : 79 - 87
[9] IMPROVING DEEP NEURAL NETWORKS FOR LVCSR USING DROPOUT AND SHRINKING STRUCTURE
Zhang, Shiliang
Bao, Yebo
Zhou, Pan
Jiang, Hui
Dai, Lirong
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] Deep Anchored Convolutional Neural Networks
Huang, Jiahui
Dwivedi, Kshitij
Roig, Gemma
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 639 - 647

← 1 2 3 4 5 →