Speech densely connected convolutional networks for small-footprint keyword spotting

被引：0

作者：

Tsung-Han Tsai

Xin-Hui Lin

机构：

[1] Department of Electrical Engineering,

[2] National Central University,undefined

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Keyword spotting; DenseNet; Group convolution; Depthwise separable convolution; SENet;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Keyword spotting is an important task for human-computer interaction (HCI). For high privacy, the identification task needs to be performed at the edge, so the purpose of this task is to improve the accuracy as much as possible within the limited cost. This paper proposes a new keyword spotting technique by the convolutional neural network (CNN) method. It is based on the application of densely connected convolutional networks (DenseNet). To make the model smaller, we replace the normal convolution with group convolution and depthwise separable convolution. We add squeeze-and-excitation networks (SENet) to enhance the weight of important features to increase the accuracy. To investigate the effect of different convolutions on DenseNet, we built two models: SpDenseNet and SpDenseNet-L. we validated the network using the Google speech commands dataset. Our proposed network had better accuracy than the other networks even with a fewer number of parameters and floating-point operations (FLOPs). SpDenseNet could achieve an accuracy of 96.3% with 122.63 K trainable parameters and 142.7 M FLOPs. Compared to the benchmark works, only about 52% of the number of parameters and about 12% of the FLOPs are used. In addition, we varied the depth and width of the network to build a compact variant. It also outperforms other compact variants, where SpDenseNet-L-narrow could achieve an accuracy of 93.6% withiri: An On-device DNN-powere 9.27 K trainable parameters and 3.47 M FLOPs. Compared to the benchmark works, the accuracy on SpDenseNet-L-narrow is improved by 3.5%. It only uses only about 47% of the number of parameters and about 48% of the FLOPS.

引用

页码：39119 / 39137

页数：18

共 50 条

[1] Speech densely connected convolutional networks for small-footprint keyword spotting
Tsai, Tsung-Han
Lin, Xin-Hui
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 39119 - 39137
[2] Convolutional Neural Networks for Small-footprint Keyword Spotting
Sainath, Tara N.
Parada, Carolina
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1478 - 1482
[3] Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Arik, Sercan O.
Kliegl, Markus
Child, Rewon
Hestness, Joel
Gibiansky, Andrew
Fougner, Chris
Prenger, Ryan
Coates, Adam
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1606 - 1610
[4] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
Chen, Xi
Yin, Shouyi
Song, Dandan
Ouyang, Peng
Liu, Leibo
Wei, Shaojun
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546
[5] Reduced Model Size Deep Convolutional Neural Networks for Small-Footprint Keyword Spotting
Tsai, Tsung Han
Lin, Xin Hui
[J]. 2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
[6] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Chen, Guoguo
Parada, Carolina
Heigold, Georg
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
Ghandoura, Abdulkader
Hjabo, Farouk
Al Dakkak, Oumayma
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
[8] Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
Chen, Mengzhe
Zhang, Shiliang
Lei, Ming
Liu, Yong
Yao, Haitao
Gao, Jie
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2663 - 2667
[9] EXPLORING REPRESENTATION LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
Cui, Fan
Guo, Liyong
Wang, Quandong
Gao, Peng
Wang, Yujun
[J]. INTERSPEECH 2022, 2022, : 3258 - 3262
[10] Model compression applied to small-footprint keyword spotting
Tucker, George
Wu, Minhua
Sun, Ming
Panchapagesan, Sankaran
Fu, Gengshen
Vitaladevuni, Shiv
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1878 - 1882

← 1 2 3 4 5 →