A DEEP NEURAL NETWORK INTEGRATED WITH FILTERBANK LEARNING FOR SPEECH RECOGNITION

被引:0
|
作者
Seki, Hiroshi [1 ]
Yamamoto, Kazumasa [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
关键词
automatic speech recognition; deep neural networks; acoustic models; filterbank learning; data-driven filterbank;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNN) have achieved significant success in the field of speech recognition. One of the main advantages of the DNN is automatic feature extraction without human intervention. Therefore, we incorporate a pseudo-filterbank layer to the bottom of DNN and train the whole filterbank layer and the following networks jointly, while most systems take pre-defined mel-scale filterbanks as acoustic features to DNN. In the experiment, we use Gaussian functions instead of triangular mel-scale filterbanks. This technique enables a filterbank layer to maintain the functionality of frequency domain smoothing. The proposed method provides an 8.0% relative improvement in clean condition on ASJ+JNAS corpus and a 2.7% relative improvement on noise-corrupted ASJ+JNAS corpus compared with traditional fully-connected DNN. Experimental results show that the frame-level transformation of filterbank layer constrains flexibility and promotes learning efficiency in acoustic modeling.
引用
收藏
页码:5480 / 5484
页数:5
相关论文
共 50 条
  • [41] Binaural Deep Neural Network for Noise Robust Automatic Speech Recognition
    Jiang, Yi
    Zu, Yuan-Yuan
    INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND AUTOMATION (ICCEA 2014), 2014, : 512 - 517
  • [42] A Gender-Aware Deep Neural Network Structure for Speech Recognition
    Zoughi, Toktam
    Homayounpour, Mohammad Mehdi
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2019, 43 (03) : 635 - 644
  • [43] Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
    Jiang, Wei
    Wang, Zheng
    Jin, Jesse S.
    Han, Xianfeng
    Li, Chunguang
    SENSORS, 2019, 19 (12)
  • [44] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
    Yu, Dong
    Deng, Li
    Seide, Frank
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
  • [45] Performance Evaluation of Deep Convolutional Maxout Neural Network in Speech Recognition
    Dehghani, Arash
    Seyyedsalehi, Seyyed Ali
    2018 25TH IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING AND 2018 3RD INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING (ICBME), 2018, : 240 - 245
  • [46] Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network
    Badshah, Abdul Malik
    Ahmad, Jamil
    Rahim, Nasir
    Baik, Sung Wook
    2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 125 - 129
  • [47] Performance Optimization of Speech Recognition System with Deep Neural Network Model
    Wei Guan
    Optical Memory and Neural Networks, 2018, 27 (4) : 272 - 282
  • [48] AN EMPIRICAL STUDY OF LEARNING RATES IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Senior, Andrew
    Heigold, Georg
    Ranzato, Marc'Aurelio
    Yang, Ke
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6724 - 6728
  • [49] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [50] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Kishor Bhangale
    Mohanaprasad Kothandaraman
    Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384