Very Deep Convolutional Neural Networks for LVCSR

被引:0
|
作者
Bi, Mengxiao [1 ]
Qian, Yanmin
Yu, Kai
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
关键词
Very Deep Convolutional Networks; Speech Recognition; Acoustic Model; CNNs; Neural Networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance when embedded in large vocabulary continuous speech recognition (LVCSR) systems due to its capability of modeling local correlations and reducing translational variations. In all previous related works for ASR, only up to two convolutional layers are employed. In light of the recent success of very deep CNNs in image classification, it is of interest to investigate the deep structure of CNNs for speech recognition in detail. In contrast to image classification, the dimensionality of the speech feature, the span size of input feature and the relationship between temporal and spectral domain are new factors to consider while designing very deep CNNs. In this work, very deep CNNs are introduced for LVCSR task, by extending depth of convolutional layers up to ten. The contribution of this work is two-fold: performance improvement of very deep CNNs is investigated under different configurations; further, a better way to perform convolution operations on temporal dimension is proposed. Experiments showed that very deep CNNs offer a 8-12% relative improvement over baseline DNN system, and a 4-7% relative improvement over baseline CNN system, evaluated on both a 15-hr Callhome and a 51-hr Switchboard LVCSR tasks.
引用
收藏
页码:3259 / 3263
页数:5
相关论文
共 50 条
  • [1] Advances in Very Deep Convolutional Neural Networks for LVCSR
    Sercu, Tom
    Goel, Vaibhava
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3429 - 3433
  • [2] VERY DEEP MULTILINGUAL CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sercu, Tom
    Puhrsch, Christian
    Kingsbury, Brian
    LeCun, Yann
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4955 - 4959
  • [3] DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Mohamed, Abdel-rahman
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8614 - 8618
  • [4] IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Kingsbury, Brian
    Mohamed, Abdel-rahman
    Dahl, George E.
    Saon, George
    Soltau, Hagen
    Beran, Tomas
    Aravkin, Aleksandr Y.
    Ramabhadran, Bhuvana
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 315 - 320
  • [5] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS
    Dai, Wei
    Dai, Chia
    Qu, Shuhui
    Li, Juncheng
    Das, Samarjit
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 421 - 425
  • [6] Combining Very Deep Convolutional Neural Networks and Recurrent Neural Networks for Video Classification
    Kiziltepe, Rukiye Savran
    Gan, John Q.
    Escobar, Juan Jose
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT II, 2019, 11507 : 811 - 822
  • [7] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Woodland, Philip C.
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 481 - 488
  • [8] Squeezed Very Deep Convolutional Neural Networks for Text Classification
    Duque, Andrea B.
    Santos, Lua Lazaro J.
    Macedo, David
    Zanchettin, Cleber
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 193 - 207
  • [9] Plant identification based on very deep convolutional neural networks
    Zhu, Heyan
    Liu, Qinglin
    Qi, Yuankai
    Huang, Xinyuan
    Jiang, Feng
    Zhang, Shengping
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29779 - 29797
  • [10] Very Deep Convolutional Neural Networks for Morphologic Classification of Erythrocytes
    Durant, Thomas J. S.
    Olson, Eben M.
    Schulz, Wade L.
    Torres, Richard
    [J]. CLINICAL CHEMISTRY, 2017, 63 (12) : 1847 - 1855