EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION

被引：0

作者：

Yu, Dong ^{[1
]}

Seide, Frank ^{[2
]}

Li, Gang ^{[2
]}

Deng, Li ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA USA

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年

关键词：

speech recognition; deep belief networks; deep neural networks; sparseness;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, we developed context-dependent deep neural network (DNN) hidden Markov models for large vocabulary speech recognition. While reducing errors by 33% compared to its discriminatively trained Gaussian-mixture counterpart on the switchboard benchmark task, DNN requires much more parameters. In this paper, we report our recent work on DNN for improved generalization, model size, and computation speed by exploiting parameter sparseness. We formulate the goal of enforcing sparseness as soft regularization and convex constraint optimization problems, and propose solutions under the stochastic gradient ascent setting. We also propose novel data structures to exploit the random sparseness patterns to reduce model size and computation time. The proposed solutions have been evaluated on the voice-search and switchboard datasets. They have decreased the number of nonzero connections to one third while reducing the error rate by 0.2-0.3% over the fully connected model on both datasets. The nonzero connections have been further reduced to only 12% and 19% on the two respective datasets without sacrificing speech recognition performance. Under these conditions we can reduce the model size to 18% and 29%, and computation to 14% and 23%, respectively, on these two datasets.

引用

页码：4409 / 4412

页数：4

共 50 条

[1] NEURON SPARSENESS VERSUS CONNECTION SPARSENESS IN DEEP NEURAL NETWORK FOR LARGE VOCABULARY SPEECH RECOGNITION
Kang, Jian
Lu, Cheng
Cai, Meng
Zhang, Wei-Qiang
Liu, Jia
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4954 - 4958
[2] Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
Jaitly, Navdeep
Patrick Nguyen
Senior, Andrew
Vanhoucke, Vincent
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2577 - 2580
[3] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
Wu, Jibin
Yilmaz, Emre
Zhang, Malu
Li, Haizhou
Tan, Kay Chen
[J]. FRONTIERS IN NEUROSCIENCE, 2020, 14
[4] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
Yu, Dong
Deng, Li
Seide, Frank
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
[5] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
Yu, Dong
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
[6] Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Farooq, Muhammad Umar
Adeeba, Farah
Rauf, Sahar
Hussain, Sarmad
[J]. INTERSPEECH 2019, 2019, : 2978 - 2982
[7] EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
He, Tianxing
Droppo, Jasha
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5445 - 5449
[8] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
Yu, Dong
Deng, Li
Seide, Frank
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
[9] Exploiting deep neural networks for detection-based speech recognition
Siniscalchi, Sabato Marco
Yu, Dong
Deng, Li
Lee, Chin-Hui
[J]. NEUROCOMPUTING, 2013, 106 : 148 - 157
[10] A CLUSTER-BASED MULTIPLE DEEP NEURAL NETWORKS METHOD FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Zhou, Pan
Liu, Cong
Liu, Qingfeng
Dai, Lirong
Jiang, Hui
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6650 - 6654

← 1 2 3 4 5 →