Sparse Wavelet Decomposition and Filter Banks with CNN Deep Learning for Speech Recognition

被引:0
|
作者
Dai, Jingzhao [1 ]
Zhang, Yaan [1 ]
Hou, Jintao [1 ]
Wang, Xiewen [1 ]
Tan, Lizhe [1 ]
Jiang, Jean [2 ]
机构
[1] Purdue Univ Northwest, Dept Elect & Comp Engn, Hammond, IN 46323 USA
[2] Purdue Univ Northwest, Coll Technol, Hammond, IN 46323 USA
关键词
Sparse discrete wavelet decomposition; Mel filter bank; filter bank; Bandpass filter banks and convolutional neural network; NEURAL-NETWORKS;
D O I
10.1109/eit.2019.8833972
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, the speech recognition algorithms using CNN deep learning based on the sparse discrete wavelet decomposition (SDWD) and bandpass filter banks (BPFB) are proposed. The proposed algorithms consist of three stages. First, speech signal is decomposed into sub-band signals according to the Mel filter bank frequency specification using the SDWD or BPFB. The power values from sub-bands form a feature vector for the speech frame. Cascading feature vectors for consecutive speech frames constructs a two-dimension feature image. Secondly, each obtained feature image is subject to flipping operations in order to reduce edge effect when using the standard CNN. Finally, the CNN deep learning is adopted for training and recognition. The experimental results demonstrate that our proposed SDWD-CNN and BPFB-CNN outperforms the support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF) algorithms.
引用
收藏
页码:98 / 103
页数:6
相关论文
共 50 条
  • [1] Speech Recognition Using Sparse Discrete Wavelet Decomposition Feature Extraction
    Dai, Jingzhao
    Vijayarajan, Vinith
    Peng, Xuan
    Tan, Li
    Jiang, Jean
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2018, : 812 - 816
  • [2] LEARNING DEEP FILTER BANKS IN PARALLEL FOR TEXTURE RECOGNITION
    Shahriari, Arash
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1634 - 1638
  • [3] Denoising Speech Based on Deep Learning and Wavelet Decomposition
    Wang, Li
    Zheng, Weiguang
    Ma, Xiaojun
    Lin, Shiming
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [4] Speech frame recognition based on less shift sensitive wavelet filter banks
    Tohidypour, Hamid Reza
    Banitalebi-Dehkordi, Amin
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (04) : 633 - 637
  • [5] A new representation for speech frame recognition based on redundant wavelet filter banks
    Tohidypour, Hamid Reza
    Seyyedsalehi, Seyyed Ali
    Behbood, Hossein
    Roshandel, Hossein
    [J]. SPEECH COMMUNICATION, 2012, 54 (02) : 256 - 271
  • [6] Speech frame recognition based on less shift sensitive wavelet filter banks
    Hamid Reza Tohidypour
    Amin Banitalebi-Dehkordi
    [J]. Signal, Image and Video Processing, 2016, 10 : 633 - 637
  • [7] Emotional speech Recognition using CNN and Deep learning techniques
    Hema, C.
    Marquez, Fausto Pedro Garcia
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [8] DEEP VARIATIONAL FILTER LEARNING MODELS FOR SPEECH RECOGNITION
    Agrawal, Purvi
    Ganapathy, Sriram
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5731 - 5735
  • [9] Deep Sparse Conformer for Speech Recognition
    Wu, Xianchao
    [J]. INTERSPEECH 2022, 2022, : 2073 - 2077
  • [10] Deep Filter Banks for Texture Recognition and Segmentation
    Cimpoi, Mircea
    Maji, Subhransu
    Vedaldi, Andrea
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3828 - 3836