Error-Diffusion Based Speech Feature Quantization for Small-Footprint Keyword Spotting

被引:1
|
作者
Luo, Mengjie [1 ,2 ]
Wang, Dingyi [1 ]
Wang, Xiaoqin [1 ,2 ]
Qiao, Shushan [1 ,2 ]
Zhou, Yumei [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Microelect, Beijing 100029, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100029, Peoples R China
关键词
Quantization (signal); Task analysis; Spectrogram; Signal processing algorithms; Filter banks; Standards; Speech processing; Keyword spotting; speech feature quantization; error diffusion; image processing; convolutional neural networks;
D O I
10.1109/LSP.2022.3179208
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Neural network based keyword spotting (KWS) system is a critical component for user interaction in current smart devices. Although small-footprint networks have been widely explored to reduce deployment overhead, low-precision input feature representation still lacks in-depth research. In this letter, an error-diffusion based speech feature quantization method is proposed. Specifically, our algorithm adapts image processing to quantize the input speech feature maps in arbitrary bits. Experiments show that in the 10-keyword KWS task, our 3-bit representation only brings a 0.45% average accuracy drop compared to the full-precision log-Mel spectrograms while others drop over 3%. In the 2 keywords task, our 3-bit representation produces no significant differences, while 1-bit quantization only leads to an average of 1.7% accuracy drop and is even capable of handling similar keywords and imbalanced data distribution. The result proves our method, to the best of our knowledge, is the first practical method that supports as low as 1-bit quantization for single-channel speech features in small-footprint KWS. In addition, we analyze the impact of error-diffusion directions and conclude that time-direction diffusion is more suitable for temporal convolutional networks.
引用
收藏
页码:1357 / 1361
页数:5
相关论文
共 50 条
  • [1] Speech densely connected convolutional networks for small-footprint keyword spotting
    Tsai, Tsung-Han
    Lin, Xin-Hui
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 39119 - 39137
  • [2] Speech densely connected convolutional networks for small-footprint keyword spotting
    Tsung-Han Tsai
    Xin-Hui Lin
    [J]. Multimedia Tools and Applications, 2023, 82 : 39119 - 39137
  • [3] Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting
    Ghandoura, Abdulkader
    Hjabo, Farouk
    Al Dakkak, Oumayma
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [4] Region Proposal Network Based Small-Footprint Keyword Spotting
    Hou, Jingyong
    Shi, Yangyang
    Ostendorf, Mari
    Hwang, Mei-Yuh
    Xie, Lei
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1471 - 1475
  • [5] Convolutional Neural Networks for Small-footprint Keyword Spotting
    Sainath, Tara N.
    Parada, Carolina
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1478 - 1482
  • [6] EXPLORING REPRESENTATION LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Cui, Fan
    Guo, Liyong
    Wang, Quandong
    Gao, Peng
    Wang, Yujun
    [J]. INTERSPEECH 2022, 2022, : 3258 - 3262
  • [7] SMALL-FOOTPRINT KEYWORD SPOTTING WITH GRAPH CONVOLUTIONAL NETWORK
    Chen, Xi
    Yin, Shouyi
    Song, Dandan
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 539 - 546
  • [8] Model compression applied to small-footprint keyword spotting
    Tucker, George
    Wu, Minhua
    Sun, Ming
    Panchapagesan, Sankaran
    Fu, Gengshen
    Vitaladevuni, Shiv
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1878 - 1882
  • [9] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Tang, Raphael
    Lin, Jimmy
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
  • [10] Text Anchor Based Metric Learning for Small-footprint Keyword Spotting
    Wang, Li
    Gu, Rongzhi
    Chen, Nuo
    Zou, Yuexian
    [J]. INTERSPEECH 2021, 2021, : 4219 - 4223