DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting

被引:1
|
作者
Gharbieh, Waseem [1 ]
Huang, Jinmiao [1 ]
Wan, Qianhui [1 ]
Shim, Han Suk [2 ]
Lee, Chul [2 ]
机构
[1] LG Elect Toronto AI Lab, Toronto, ON, Canada
[2] LG Elect Artificial Intelligence Lab, Toronto, ON, Canada
来源
关键词
Dynamic Convolution; Open-vocabulary Keyword Spotting; User-defined Keyword Spotting; Query-by-Example; ConvMixer; QUERY;
D O I
10.21437/Interspeech.2022-11090
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
User-defined keyword spotting research has been gaining popularity in recent years. An open-vocabulary keyword spotting system with high accuracy and low power consumption remains a challenging problem. In this paper, we propose the DyCon-vMixer model for tackling the problem. By leveraging dynamic convolution alongside a convolutional equivalent of the MLP-Mixer architecture, we obtain an efficient and effective model that has less than 200K parameters and uses less than 11M MACs. Despite the fact that our model is less than half the size of state-of-the-art RNN and CNN models, it shows competitive results on the publicly available Hey-Snips and Hey-Snapdragon datasets. In addition, we discuss the importance of designing an effective evaluation system and detail our evaluation pipeline for comparison with future work.
引用
收藏
页码:5205 / 5209
页数:5
相关论文
共 13 条
  • [1] Neural keyword confidence estimation for open-vocabulary keyword spotting
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    [J]. ELECTRONICS LETTERS, 2022, 58 (03) : 133 - 135
  • [2] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    [J]. INTERSPEECH 2019, 2019, : 3362 - 3366
  • [3] Keyword-dependent monaural speech enhancement for open-vocabulary keyword spotting
    Liu, Zuozhen
    Wu, Chou
    Li, Ta
    Zhao, Qingwei
    [J]. Shengxue Xuebao/Acta Acustica, 2023, 48 (02): : 415 - 424
  • [4] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
    Shin, Hyeon-Kyeong
    Han, Hyewon
    Kim, Doyeon
    Chung, Soo-Whan
    Kang, Hong-Goo
    [J]. INTERSPEECH 2022, 2022, : 1871 - 1875
  • [5] Predicting detection filters for small footprint open-vocabulary keyword spotting
    Bluche, Theodore
    Gisselbrecht, Thibault
    [J]. INTERSPEECH 2020, 2020, : 2552 - 2556
  • [6] QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer
    Huang, Jinmiao
    Gharbieh, Waseem
    Wan, Qianhui
    Shim, Han Suk
    Lee, Hyun Chul
    [J]. INTERSPEECH 2022, 2022, : 5200 - 5204
  • [7] RNN-T BASED OPEN-VOCABULARY KEYWORD SPOTTING IN MANDARIN WITH MULTI-LEVEL DETECTION
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5649 - 5653
  • [8] Lattice-Free Open Vocabulary Keyword Spotting
    Ramesh, Gundluru
    Doppa, Naveen
    Murty, K. Sri Rama
    [J]. 2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
  • [9] End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
    Wei, Bo
    Yang, Meirong
    Zhang, Tao
    Tang, Xiao
    Huang, Xing
    Kim, Kyuhong
    Lee, Jaeyun
    Cho, Kiho
    Park, Sung-Un
    [J]. INTERSPEECH 2021, 2021, : 361 - 365
  • [10] Open-Vocabulary Keyword Detection from Super-Large Scale Speech Database
    Kanda, Naoyuki
    Sagawa, Hirohiko
    Sumiyoshi, Takashi
    Obuchi, Yasunari
    [J]. 2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 943 - 948