DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR

被引:0
|
作者
Sainath, Tara N. [1 ]
Mohamed, Abdel-rahman
Kingsbury, Brian [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Neural Networks; Speech Recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, CNNs are a more effective model for speech compared to Deep Neural Networks (DNNs). In this paper, we explore applying CNNs to large vocabulary speech tasks. First, we determine the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks. Specifically, we focus on how many convolutional layers are needed, what is the optimal number of hidden units, what is the best pooling strategy, and the best input feature type for CNNs. We then explore the behavior of neural network features extracted from CNNs on a variety of LVCSR tasks, comparing CNNs to DNNs and GMMs. We find that CNNs offer between a 13-30% relative improvement over GMMs, and a 4-12% relative improvement over DNNs, on a 400-hr Broadcast News and 300-hr Switchboard task.
引用
收藏
页码:8614 / 8618
页数:5
相关论文
共 50 条
  • [1] IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sainath, Tara N.
    Kingsbury, Brian
    Mohamed, Abdel-rahman
    Dahl, George E.
    Saon, George
    Soltau, Hagen
    Beran, Tomas
    Aravkin, Aleksandr Y.
    Ramabhadran, Bhuvana
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 315 - 320
  • [2] Very Deep Convolutional Neural Networks for LVCSR
    Bi, Mengxiao
    Qian, Yanmin
    Yu, Kai
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3259 - 3263
  • [3] Advances in Very Deep Convolutional Neural Networks for LVCSR
    Sercu, Tom
    Goel, Vaibhava
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3429 - 3433
  • [4] VERY DEEP MULTILINGUAL CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
    Sercu, Tom
    Puhrsch, Christian
    Kingsbury, Brian
    LeCun, Yann
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4955 - 4959
  • [5] Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks
    Sainath, Tara N.
    Peddinti, Vijayaditya
    Kingsbury, Brian
    Fousek, Petr
    Ramabhadran, Bhuvana
    Nahamoo, David
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 900 - 904
  • [6] Investigation into the use of deep neural networks for LVCSR of Czech
    Mateju, Lukas
    Cerva, Petr
    Zdansky, Jindrich
    [J]. 2015 IEEE INTERNATIONAL WORKSHOP OF ELECTRONICS, CONTROL, MEASUREMENT, SIGNALS AND THEIR APPLICATION TO MECHATRONICS (ECMSM), 2015,
  • [7] Convolutional Neural Networks for Acoustic Modeling of Raw Time Signal in LVCSR
    Golik, Pavel
    Tueske, Zoltan
    Schlueter, Ralf
    Ney, Hermann
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 26 - 30
  • [8] Deep Convolutional Neural Networks
    Gonzalez, Rafael C.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (06) : 79 - 87
  • [9] IMPROVING DEEP NEURAL NETWORKS FOR LVCSR USING DROPOUT AND SHRINKING STRUCTURE
    Zhang, Shiliang
    Bao, Yebo
    Zhou, Pan
    Jiang, Hui
    Dai, Lirong
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] Deep Anchored Convolutional Neural Networks
    Huang, Jiahui
    Dwivedi, Kshitij
    Roig, Gemma
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 639 - 647