Error-Diffusion Based Speech Feature Quantization for Small-Footprint Keyword Spotting

被引:1
|
作者
Luo, Mengjie [1 ,2 ]
Wang, Dingyi [1 ]
Wang, Xiaoqin [1 ,2 ]
Qiao, Shushan [1 ,2 ]
Zhou, Yumei [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Microelect, Beijing 100029, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100029, Peoples R China
关键词
Quantization (signal); Task analysis; Spectrogram; Signal processing algorithms; Filter banks; Standards; Speech processing; Keyword spotting; speech feature quantization; error diffusion; image processing; convolutional neural networks;
D O I
10.1109/LSP.2022.3179208
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Neural network based keyword spotting (KWS) system is a critical component for user interaction in current smart devices. Although small-footprint networks have been widely explored to reduce deployment overhead, low-precision input feature representation still lacks in-depth research. In this letter, an error-diffusion based speech feature quantization method is proposed. Specifically, our algorithm adapts image processing to quantize the input speech feature maps in arbitrary bits. Experiments show that in the 10-keyword KWS task, our 3-bit representation only brings a 0.45% average accuracy drop compared to the full-precision log-Mel spectrograms while others drop over 3%. In the 2 keywords task, our 3-bit representation produces no significant differences, while 1-bit quantization only leads to an average of 1.7% accuracy drop and is even capable of handling similar keywords and imbalanced data distribution. The result proves our method, to the best of our knowledge, is the first practical method that supports as low as 1-bit quantization for single-channel speech features in small-footprint KWS. In addition, we analyze the impact of error-diffusion directions and conclude that time-direction diffusion is more suitable for temporal convolutional networks.
引用
收藏
页码:1357 / 1361
页数:5
相关论文
共 50 条
  • [41] Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting
    Ma, Fei
    Wang, Chengliang
    Li, Xusheng
    Zeng, Zhuo
    [J]. SPEECH COMMUNICATION, 2024, 156
  • [42] Low-complex and Highly-performed Binary Residual Neural Network for Small-footprint Keyword Spotting
    Wang, Xiao
    Cheng, Song
    Li, Jun
    Qiao, Shushan
    Zhou, Yumei
    Zhan, Yi
    [J]. INTERSPEECH 2022, 2022, : 3233 - 3237
  • [43] A keyword spotting method based on speech feature space trace matching
    Wu, XH
    Wu, YD
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 4193 - 4193
  • [44] Keyword spotting method based on speech feature space trace matching
    Wu, YD
    Liu, BL
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 3188 - 3192
  • [45] PocketSUMMIT: Small-Footprint Continuous Speech Recognition
    Hetherington, I. Lee
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2173 - 2176
  • [46] Inverse error-diffusion using classified vector quantization
    Lai, JZC
    Yen, JY
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 1998, 7 (12) : 1753 - 1758
  • [47] MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Sun, Ming
    Raju, Anirudh
    Tucker, George
    Panchapagesan, Sankaran
    Fu, Gengshen
    Mandal, Arindam
    Matsoukas, Spyros
    Strom, Nikko
    Vitaladevuni, Shiv
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 474 - 480
  • [48] Structure Growth for Small-Footprint Speech Recognition
    Wu, Jiayao
    Tang, Zhiyuan
    Wang, Dong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 461 - 465
  • [49] AUTOMATIC GAIN CONTROL AND MULTI-STYLE TRAINING FOR ROBUST SMALL-FOOTPRINT KEYWORD SPOTTING WITH DEEP NEURAL NETWORKS
    Prabhavalkar, Rohit
    Alvarez, Raziel
    Parada, Carolina
    Nakkiran, Preetum
    Sainath, Tara N.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4704 - 4708
  • [50] Multi-keyword spotting based on speech feature space trace matching
    Li, FQ
    Wu, YD
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 3542 - 3546