Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引：0

作者：

Kesavaraj, V ^{[1
]}

Vuppala, Anil ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India

来源：

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024 | 2024年

关键词：

Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;

D O I：

10.1109/SPCOM60851.2024.10631637

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).

引用

页数：5

共 50 条

[21] End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention
Wei, Bo
Yang, Meirong
Zhang, Tao
Tang, Xiao
Huang, Xing
Kim, Kyuhong
Lee, Jaeyun
Cho, Kiho
Park, Sung-Un
INTERSPEECH 2021, 2021, : 361 - 365
[22] Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems
Rusci, Manuele
Tuytelaars, Tinne
INTERSPEECH 2023, 2023, : 2768 - 2772
[23] On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Yang, Gene-Ping
Gu, Yue
Tang, Qingming
Du, Dongsu
Liu, Yuzong
INTERSPEECH 2023, 2023, : 1623 - 1627
[24] Learning to Tag from Open Vocabulary Labels
Law, Edith
Settles, Burr
Mitchell, Tom
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II: EUROPEAN CONFERENCE, ECML PKDD 2010, 2010, 6322 : 211 - 226
[25] An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting
Sun, Ming
Schwarz, Andreas
Wu, Minhua
Strom, Nikko
Matsoukas, Spyros
Vitaladevuni, Shiv
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 255 - 260
[26] The Roots of the Early Vocabulary in Infants' Learning From Speech
Swingley, Daniel
CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2008, 17 (05) : 308 - 312
[27] Visually grounded learning of keyword prediction from untranscribed speech
Kamper, Herman
Settle, Shane
Shakhnarovich, Gregory
Livescu, Karen
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3677 - 3681
[28] Selective transfer subspace learning for small-footprint end-to-end cross-domain keyword spotting
Ma, Fei
Wang, Chengliang
Li, Xusheng
Zeng, Zhuo
SPEECH COMMUNICATION, 2024, 156
[29] Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Jia, Ye
Zhang, Yu
Weiss, Ron J.
Wang, Quan
Shen, Jonathan
Ren, Fei
Chen, Zhifeng
Nguyen, Patrick
Pang, Ruoming
Moreno, Ignacio Lopez
Wu, Yonghui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[30] A Small Vocabulary Automatic Filipino Speech Profanity Suppression System Using Hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) Keyword Spotting Framework
Ablaza, Fernando I., Jr.
Danganan, Timothy Oliver D.
Javier, Bryan Paul L.
Manalang, Kevin S.
Montalvo, Denise Erica V.
Ambata, Leonard U.
2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,

← 1 2 3 4 5 →