Custom Mandarin Keyword Spotting with Extended Long Short-Term Memory

被引:0
|
作者
Cao, Haitao [1 ]
Liu, Xi [1 ]
Tan, Zhiguo [1 ]
Yang, Zhenlun [1 ]
Qin, Xin [2 ]
机构
[1] School of Information Engineering, Guangzhou Panyu Polytechnic, Guangzhou,511483, China
[2] Institute of Big Data and Internet Innovation, Hunan University of Technology and Business, Changsha,410205, China
关键词
Deep neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
In real-world scenarios, Deep Neural Network (DNN)-powered Keyword Spotting (KWS) systems are typically engineered as lightweight architectures, optimizing for superior performance and low computational complexity in resource-limited devices. However, such lightweight designs often encounter limitations in generalization, particularly when it comes to customizing keywords. This paper presents a twostage method to customize a Mandarin KWS system rapidly. First, we propose an embedding model to learn the embedding representations of general Mandarin keywords. Subsequently, we facilitate keyword customization with the generalization capability of embedding models through few-shot transfer learning. To improve performance further, in the embedding model, we introduce two scale blocks to fuse acoustic features and employ an Enhanced Extended Long Short-Term Memory (ExLSTM) as the backbone. Experimental results on both English and Mandarin keyword datasets highlight the advantages of the proposed embedding model. In addition, we conduct keyword customization on a self-recorded dataset containing 10 Mandarin keywords. The impressive average accuracy of 97.45% with merely five target samples demonstrates the effectiveness of our method. © (2024), (International Association of Engineers). All rights reserved.
引用
收藏
页码:1933 / 1942
相关论文
共 50 条
  • [1] Keyword spotting exploiting Long Short-Term Memory
    Woellmer, Martin
    Schuller, Bjoern
    Rigoll, Gerhard
    SPEECH COMMUNICATION, 2013, 55 (02) : 252 - 265
  • [2] QUERY-BY-EXAMPLE KEYWORD SPOTTING USING LONG SHORT-TERM MEMORY NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Sainath, Tara N.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5236 - 5240
  • [3] Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting
    Meng, Zhong
    Juang, Biing-Hwang
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3547 - 3551
  • [4] MAX-POOLING LOSS TRAINING OF LONG SHORT-TERM MEMORY NETWORKS FOR SMALL-FOOTPRINT KEYWORD SPOTTING
    Sun, Ming
    Raju, Anirudh
    Tucker, George
    Panchapagesan, Sankaran
    Fu, Gengshen
    Mandal, Arindam
    Matsoukas, Spyros
    Strom, Nikko
    Vitaladevuni, Shiv
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 474 - 480
  • [5] Applying Deep Bidirectional Long Short-Term Memory to Mandarin Tone Recognition
    Yang, Longfei
    Xie, Yanlu
    Zhang, Jinsong
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 1124 - 1127
  • [6] Long short-term memory
    Hochreiter, S
    Schmidhuber, J
    NEURAL COMPUTATION, 1997, 9 (08) : 1735 - 1780
  • [7] A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks
    Frinken, Volkmar
    Fischer, Andreas
    Bunke, Horst
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2010, 5998 : 185 - 196
  • [8] Short-term Load Forecasting with Distributed Long Short-Term Memory
    Dong, Yi
    Chen, Yang
    Zhao, Xingyu
    Huang, Xiaowei
    2023 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE, ISGT, 2023,
  • [9] A short-term prediction model of global ionospheric VTEC based on the combination of long short-term memory and convolutional long short-term memory
    Peng Chen
    Rong Wang
    Yibin Yao
    Hao Chen
    Zhihao Wang
    Zhiyuan An
    Journal of Geodesy, 2023, 97
  • [10] On extended long short-term memory and dependent bidirectional recurrent neural network
    Su, Yuanhang
    Kuo, C-C Jay
    NEUROCOMPUTING, 2019, 356 : 151 - 161