Sparse Wavelet Decomposition and Filter Banks with CNN Deep Learning for Speech Recognition

被引：0

作者：

Dai, Jingzhao ^{[1
]}

Zhang, Yaan ^{[1
]}

Hou, Jintao ^{[1
]}

Wang, Xiewen ^{[1
]}

Tan, Lizhe ^{[1
]}

Jiang, Jean ^{[2
]}

机构：

[1] Purdue Univ Northwest, Dept Elect & Comp Engn, Hammond, IN 46323 USA

[2] Purdue Univ Northwest, Coll Technol, Hammond, IN 46323 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT) | 2019年

关键词：

Sparse discrete wavelet decomposition; Mel filter bank; filter bank; Bandpass filter banks and convolutional neural network; NEURAL-NETWORKS;

D O I：

10.1109/eit.2019.8833972

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper, the speech recognition algorithms using CNN deep learning based on the sparse discrete wavelet decomposition (SDWD) and bandpass filter banks (BPFB) are proposed. The proposed algorithms consist of three stages. First, speech signal is decomposed into sub-band signals according to the Mel filter bank frequency specification using the SDWD or BPFB. The power values from sub-bands form a feature vector for the speech frame. Cascading feature vectors for consecutive speech frames constructs a two-dimension feature image. Secondly, each obtained feature image is subject to flipping operations in order to reduce edge effect when using the standard CNN. Finally, the CNN deep learning is adopted for training and recognition. The experimental results demonstrate that our proposed SDWD-CNN and BPFB-CNN outperforms the support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF) algorithms.

引用

页码：98 / 103

页数：6

共 50 条

[31] Double sparse learning model for speech emotion recognition
Zong, Yuan
Zheng, Wenming
Cui, Zhen
Li, Qiang
[J]. ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
[32] TEXTURE-BASED FINGERPRINT RECOGNITION COMBINING DIRECTIONAL FILTER BANKS AND WAVELET
Li, Chaorong
Fu, Bo
Li, Jianping
Yang, Xingchun
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (04)
[33] Deep-Sparse-Representation-Based Features for Speech Recognition
Sharma, Pulkit
Abrol, Vinayak
Sao, Anil Kumar
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2162 - 2175
[34] An automated license plate detection and recognition system based on wavelet decomposition and CNN
Slimani, Ibtissam
Zaarane, Abdelmoghit
Al Okaishi, Wahban
Atouf, Issam
Hamdoun, Abdellatif
[J]. ARRAY, 2020, 8
[35] Learning Salient Features for Speech Emotion Recognition Using CNN
Liu, Jiamu
Han, Wenjing
Ruan, Huabin
Chen, Xiaomin
Jiang, Dongmei
Li, Haifeng
[J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[36] A Deep CNN Approach with Transfer Learning for Image Recognition
Iorga, Cristian
Neagoe, Victor-Emil
[J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
[37] Korean speech recognition using deep learning
Lee, Suji
Han, Seokjin
Park, Sewon
Lee, Kyeongwon
Lee, Jaeyong
[J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 213 - 227
[38] A novel data encryption algorithm based on wavelet filter banks and the singular value decomposition
Koh, MS
Rodriguez-Marek, E
[J]. LCN 2004: 29TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON LOCAL COMPUTER NETWORKS, PROCEEDINGS, 2004, : 320 - 326
[39] Persian speech recognition using deep learning
Veisi, Hadi
Haji Mani, Armita
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 893 - 905
[40] Speech Emotion Recognition Using Deep Learning
Alagusundari, N.
Anuradha, R.
[J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325

← 1 2 3 4 5 →