Sparse Wavelet Decomposition and Filter Banks with CNN Deep Learning for Speech Recognition

被引:0
|
作者
Dai, Jingzhao [1 ]
Zhang, Yaan [1 ]
Hou, Jintao [1 ]
Wang, Xiewen [1 ]
Tan, Lizhe [1 ]
Jiang, Jean [2 ]
机构
[1] Purdue Univ Northwest, Dept Elect & Comp Engn, Hammond, IN 46323 USA
[2] Purdue Univ Northwest, Coll Technol, Hammond, IN 46323 USA
关键词
Sparse discrete wavelet decomposition; Mel filter bank; filter bank; Bandpass filter banks and convolutional neural network; NEURAL-NETWORKS;
D O I
10.1109/eit.2019.8833972
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, the speech recognition algorithms using CNN deep learning based on the sparse discrete wavelet decomposition (SDWD) and bandpass filter banks (BPFB) are proposed. The proposed algorithms consist of three stages. First, speech signal is decomposed into sub-band signals according to the Mel filter bank frequency specification using the SDWD or BPFB. The power values from sub-bands form a feature vector for the speech frame. Cascading feature vectors for consecutive speech frames constructs a two-dimension feature image. Secondly, each obtained feature image is subject to flipping operations in order to reduce edge effect when using the standard CNN. Finally, the CNN deep learning is adopted for training and recognition. The experimental results demonstrate that our proposed SDWD-CNN and BPFB-CNN outperforms the support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF) algorithms.
引用
收藏
页码:98 / 103
页数:6
相关论文
共 50 条
  • [31] Double sparse learning model for speech emotion recognition
    Zong, Yuan
    Zheng, Wenming
    Cui, Zhen
    Li, Qiang
    [J]. ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
  • [32] TEXTURE-BASED FINGERPRINT RECOGNITION COMBINING DIRECTIONAL FILTER BANKS AND WAVELET
    Li, Chaorong
    Fu, Bo
    Li, Jianping
    Yang, Xingchun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (04)
  • [33] Deep-Sparse-Representation-Based Features for Speech Recognition
    Sharma, Pulkit
    Abrol, Vinayak
    Sao, Anil Kumar
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2162 - 2175
  • [34] An automated license plate detection and recognition system based on wavelet decomposition and CNN
    Slimani, Ibtissam
    Zaarane, Abdelmoghit
    Al Okaishi, Wahban
    Atouf, Issam
    Hamdoun, Abdellatif
    [J]. ARRAY, 2020, 8
  • [35] Learning Salient Features for Speech Emotion Recognition Using CNN
    Liu, Jiamu
    Han, Wenjing
    Ruan, Huabin
    Chen, Xiaomin
    Jiang, Dongmei
    Li, Haifeng
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [36] A Deep CNN Approach with Transfer Learning for Image Recognition
    Iorga, Cristian
    Neagoe, Victor-Emil
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
  • [37] Korean speech recognition using deep learning
    Lee, Suji
    Han, Seokjin
    Park, Sewon
    Lee, Kyeongwon
    Lee, Jaeyong
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 213 - 227
  • [38] A novel data encryption algorithm based on wavelet filter banks and the singular value decomposition
    Koh, MS
    Rodriguez-Marek, E
    [J]. LCN 2004: 29TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON LOCAL COMPUTER NETWORKS, PROCEEDINGS, 2004, : 320 - 326
  • [39] Persian speech recognition using deep learning
    Veisi, Hadi
    Haji Mani, Armita
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 893 - 905
  • [40] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325