AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION

被引：0

作者：

Huang, Jui-Ting ^{[1
]}

Li, Jinyu ^{[1
]}

Gong, Yifan ^{[1
]}

机构：

[1] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Convolutional neural networks; DNN; low footprint models; maxout units;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Despite the fact that several sites have reported the effectiveness of convolutional neural networks (CNNs) on some tasks, there is no deep analysis regarding why CNNs perform well and in which case we should see CNNs' advantage. In the light of this, this paper aims to provide some detailed analysis of CNNs. By visualizing the localized filters learned in the convolutional layer, we show that edge detectors in varying directions can be automatically learned. We then identify four domains we think CNNs can consistently provide advantages over fully-connected deep neural networks (DNNs): channel-mismatched training-test conditions, noise robustness, distant speech recognition, and low-footprint models. For distant speech recognition, a CNN trained on 1000 hours of Kinect distant speech data obtains relative 4% word error rate reduction (WERR) over a DNN of a similar size. To our knowledge, this is the largest corpus so far reported in the literature for CNNs to show its effectiveness. Lastly, we establish that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices. This setup gives relative 9.3% WERR from DNNs with sigmoid units.

引用

页码：4989 / 4993

页数：5

共 50 条

[1] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[2] Continuous speech recognition by convolutional neural networks
Zhang, Qing-Qing
Liu, Yong
Pan, Jie-Lin
Yan, Yong-Hong
[J]. Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2015, 37 (09): : 1212 - 1217
[3] Convolutional Neural Networks for Distant Speech Recognition
Swietojanski, Pawel
Ghoshal, Arnab
Renals, Steve
[J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
[4] Speech Recognition Based on Convolutional Neural Networks
Du Guiming
Wang Xia
Wang Guangyan
Zhang Yan
Li Dan
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
[5] Continuous speech emotion recognition with convolutional neural networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
[J]. AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
[6] Continuous Speech Emotion Recognition with Convolutional Neural Networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
[J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
[7] Speech recognition in noisy environments with Convolutional Neural Networks
Santos, Rafael M.
Matos, Leonardo N.
Macedo, Hendrik T.
Montalvao, Jugurta
[J]. 2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
[8] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[9] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
Meyer, Patrick
Xu, Ziyi
Fingscheidt, Tim
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
[10] FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Dossou, Bonaventure F. P.
Gbenou, Yeno K. S.
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3526 - 3531

← 1 2 3 4 5 →