A DEEP NEURAL NETWORK INTEGRATED WITH FILTERBANK LEARNING FOR SPEECH RECOGNITION

被引:0
|
作者
Seki, Hiroshi [1 ]
Yamamoto, Kazumasa [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
automatic speech recognition; deep neural networks; acoustic models; filterbank learning; data-driven filterbank;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as acoustic features to DNN. In the experiment, we use Gaussian functions instead of triangular mel-scale filterbanks. This technique enables a filterbank layer to maintain the functionality of frequency domain smoothing. The proposed method provides an 8.0% relative improvement in clean condition on ASJ+JNAS corpus and a 2.7% relative improvement on noise-corrupted ASJ+JNAS corpus compared with traditional fully-connected DNN. Experimental results show that the frame-level transformation of filterbank layer constrains flexibility and promotes learning efficiency in acoustic modeling.
引用
收藏
页码:5480 / 5484
页数:5
相关论文
共 50 条
  • [31] Nonlinear Network Speech Recognition Structure in a Deep Learning Algorithm
    Meng, Liang
    Kuppuswamy, Prakash
    Upadhyay, Jinal
    Kumar, Sumit
    Athawale, Shashikant, V
    Shah, Mohd Asif
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [32] Accelerated Parallelizable Neural Network Learning Algorithm for Speech Recognition
    Yu, Dong
    Deng, Li
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2292 - 2295
  • [33] Ensemble Learning With Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition
    Ai, Xusheng
    Sheng, Victor S.
    Fang, Wei
    Ling, Charles X.
    Li, Chunhua
    IEEE ACCESS, 2020, 8 : 199909 - 199919
  • [34] LOCAL TRAJECTORY BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION WITH DEEP NEURAL NETWORK
    You, Yongbin
    Qian, Yanmin
    Yu, Kai
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 5 - 9
  • [35] Speech Recognition Model for Assamese Language Using Deep Neural Network
    Singh, Moirangthem Tiken
    Barman, Partha Pratim
    Gogoi, Rupjyoti
    2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2722 - 2727
  • [36] A Gender-Aware Deep Neural Network Structure for Speech Recognition
    Toktam Zoughi
    Mohammad Mehdi Homayounpour
    Iranian Journal of Science and Technology, Transactions of Electrical Engineering, 2019, 43 : 635 - 644
  • [37] Evaluation of Modified Deep Neural Network Architecture Performance for Speech Recognition
    Haque, Md Amaan
    Alex, John Sahaya Rani
    Venkatesan, Nithya
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEM (ICIAS 2018) / WORLD ENGINEERING, SCIENCE & TECHNOLOGY CONGRESS (ESTCON), 2018,
  • [38] Deep Neural Network Quantizers Outperforming Continuous Speech Recognition Systems
    Watzel, Tobias
    Li, Lujun
    Kuerzinger, Ludwig
    Rigoll, Gerhard
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 530 - 539
  • [39] Speech Recognition Based on Deep Tensor Neural Network and Multifactor Feature
    Shan, Yahui
    Liu, Min
    Zhan, Qingran
    Du, Shixuan
    Wang, Jing
    Xie, Xiang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 650 - 654
  • [40] A Multi-Region Deep Neural Network Model in Speech Recognition
    Cui, Jia
    Saon, George
    Ramabhadran, Bhuvana
    Kingsbury, Brian
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3244 - 3248