Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引:0
|
作者
Kesavaraj, V [1 ]
Vuppala, Anil [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India
关键词
Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;
D O I
10.1109/SPCOM60851.2024.10631637
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Keyword-dependent monaural speech enhancement for open-vocabulary keyword spotting
    Liu, Zuozhen
    Wu, Chou
    Li, Ta
    Zhao, Qingwei
    Shengxue Xuebao/Acta Acustica, 2023, 48 (02): : 415 - 424
  • [2] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
    Shin, Hyeon-Kyeong
    Han, Hyewon
    Kim, Doyeon
    Chung, Soo-Whan
    Kang, Hong-Goo
    INTERSPEECH 2022, 2022, : 1871 - 1875
  • [3] Neural keyword confidence estimation for open-vocabulary keyword spotting
    Liu, Zuozhen
    Li, Ta
    Zhang, Pengyuan
    ELECTRONICS LETTERS, 2022, 58 (03) : 133 - 135
  • [4] Lattice-Free Open Vocabulary Keyword Spotting
    Ramesh, Gundluru
    Doppa, Naveen
    Murty, K. Sri Rama
    2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
  • [5] Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting
    Seo, Deokjin
    Oh, Heung-Seon
    Jung, Yuchul
    IEEE ACCESS, 2021, 9 : 80682 - 80691
  • [6] Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting
    Seo, Deokjin
    Oh, Heung-Seon
    Jung, Yuchul
    Jung, Yuchul (jyc@kumoh.ac.kr), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 80682 - 80691
  • [7] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 3362 - 3366
  • [8] Speech Augmentation Based Unsupervised Learning for Keyword Spotting
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Tang, Haobin
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] Prototypical Metric Transfer Learning for Continuous Speech Keyword Spotting with Limited Training Data
    Seth, Harshita
    Kumar, Pulkit
    Srivastava, Muktabh Mayank
    14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 273 - 280
  • [10] DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting
    Gharbieh, Waseem
    Huang, Jinmiao
    Wan, Qianhui
    Shim, Han Suk
    Lee, Chul
    INTERSPEECH 2022, 2022, : 5205 - 5209