Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引:0
|
作者
Kesavaraj, V [1 ]
Vuppala, Anil [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India
关键词
Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;
D O I
10.1109/SPCOM60851.2024.10631637
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).
引用
收藏
页数:5
相关论文
共 50 条
  • [21] End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
    Wei, Bo
    Yang, Meirong
    Zhang, Tao
    Tang, Xiao
    Huang, Xing
    Kim, Kyuhong
    Lee, Jaeyun
    Cho, Kiho
    Park, Sung-Un
    INTERSPEECH 2021, 2021, : 361 - 365
  • [22] Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems
    Rusci, Manuele
    Tuytelaars, Tinne
    INTERSPEECH 2023, 2023, : 2768 - 2772
  • [23] On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
    Yang, Gene-Ping
    Gu, Yue
    Tang, Qingming
    Du, Dongsu
    Liu, Yuzong
    INTERSPEECH 2023, 2023, : 1623 - 1627
  • [24] Learning to Tag from Open Vocabulary Labels
    Law, Edith
    Settles, Burr
    Mitchell, Tom
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II: EUROPEAN CONFERENCE, ECML PKDD 2010, 2010, 6322 : 211 - 226
  • [25] An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting
    Sun, Ming
    Schwarz, Andreas
    Wu, Minhua
    Strom, Nikko
    Matsoukas, Spyros
    Vitaladevuni, Shiv
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 255 - 260
  • [26] The Roots of the Early Vocabulary in Infants' Learning From Speech
    Swingley, Daniel
    CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2008, 17 (05) : 308 - 312
  • [27] Visually grounded learning of keyword prediction from untranscribed speech
    Kamper, Herman
    Settle, Shane
    Shakhnarovich, Gregory
    Livescu, Karen
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3677 - 3681
  • [28] Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting
    Ma, Fei
    Wang, Chengliang
    Li, Xusheng
    Zeng, Zhuo
    SPEECH COMMUNICATION, 2024, 156
  • [29] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
    Jia, Ye
    Zhang, Yu
    Weiss, Ron J.
    Wang, Quan
    Shen, Jonathan
    Ren, Fei
    Chen, Zhifeng
    Nguyen, Patrick
    Pang, Ruoming
    Moreno, Ignacio Lopez
    Wu, Yonghui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] A Small Vocabulary Automatic Filipino Speech Profanity Suppression System Using Hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) Keyword Spotting Framework
    Ablaza, Fernando I., Jr.
    Danganan, Timothy Oliver D.
    Javier, Bryan Paul L.
    Manalang, Kevin S.
    Montalvo, Denise Erica V.
    Ambata, Leonard U.
    2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,