A DEEP NEURAL NETWORK INTEGRATED WITH FILTERBANK LEARNING FOR SPEECH RECOGNITION

被引:0
|
作者
Seki, Hiroshi [1 ]
Yamamoto, Kazumasa [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
automatic speech recognition; deep neural networks; acoustic models; filterbank learning; data-driven filterbank;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as acoustic features to DNN. In the experiment, we use Gaussian functions instead of triangular mel-scale filterbanks. This technique enables a filterbank layer to maintain the functionality of frequency domain smoothing. The proposed method provides an 8.0% relative improvement in clean condition on ASJ+JNAS corpus and a 2.7% relative improvement on noise-corrupted ASJ+JNAS corpus compared with traditional fully-connected DNN. Experimental results show that the frame-level transformation of filterbank layer constrains flexibility and promotes learning efficiency in acoustic modeling.
引用
收藏
页码:5480 / 5484
页数:5
相关论文
共 50 条
  • [1] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
  • [2] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [3] Active Learning for Speech Emotion Recognition Using Deep Neural Network
    Abdelwahab, Mohammed
    Busso, Carlos
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [4] IMPROVEMENTS TO FILTERBANK AND DELTA LEARNING WITHIN A DEEP NEURAL NETWORK FRAMEWORK
    Sainath, Tara N.
    Kingsbury, Brian
    Mohamed, Abdel-rahman
    Saon, George
    Ramabhadran, Bhuvana
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Stimulated Deep Neural Network for Speech Recognition
    Wu, Chunyang
    Karanasou, Penny
    Gales, Mark J. F.
    Sim, Khe Chai
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 400 - 404
  • [6] Auditory-based wavelet packet filterbank for speech recognition using neural network
    Gandhiraj, R.
    Sathidevi, P. S.
    ADCOM 2007: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2007, : 666 - +
  • [7] Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine
    Han, Kun
    Yu, Dong
    Tashev, Ivan
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 223 - 227
  • [8] RAPID SPEAKER ADAPTATION OF NEURAL NETWORK BASED FILTERBANK LAYER FOR AUTOMATIC SPEECH RECOGNITION
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 574 - 580
  • [9] Eye state recognition based on deep integrated neural network and transfer learning
    Zhao, Lei
    Wang, Zengcai
    Zhang, Guoxin
    Qi, Yazhou
    Wang, Xiaojin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (15) : 19415 - 19438
  • [10] Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection
    Cakir, Emre
    Ozan, Ezgi Can
    Virtanen, Tuomas
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3399 - 3406