Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引：0

作者：

Kesavaraj, V ^{[1
]}

Vuppala, Anil ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India

来源：

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024 | 2024年

关键词：

Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;

D O I：

10.1109/SPCOM60851.2024.10631637

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).

引用

页数：5

共 50 条

[31] Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
Chen, Wenda
Hasegawa-Johnson, Mark
Chen, Nancy F.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2047 - 2051
[32] OPEN VOCABULARY SPOKEN DOCUMENT RETRIEVAL BY SUBWORD SEQUENCE OBTAINED FROM SPEECH RECOGNIZER
Kuriki, Go
Itoh, Yoshiaki
Kojima, Kazunori
Ishigame, Masaaki
Tanaka, Kazuyo
Lee, Shi-wook
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 301 - +
[33] Open set transfer learning through distribution driven active learning
Wang, Min
Wen, Ting
Jiang, Xiao-Yu
Zhang, An-An
PATTERN RECOGNITION, 2024, 146
[34] Transfer of statistical learning from passive speech perception to speech production
Murphy, Timothy K.
Nozari, Nazbanou
Holt, Lori L.
PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (03) : 1193 - 1205
[35] Bridging Over from Learning Videos to Learning Resources Through Automatic Keyword Extraction
Schulten, Cleo
Manske, Sven
Langner-Thiele, Angela
Hoppe, H. Ulrich
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 382 - 386
[36] A multi-phase approach for fast spotting of large vocabulary Chinese keywords from Mandarin speech using prosodic information
Bai, BR
Tseng, CY
Lee, LS
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 903 - 906
[37] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
Bollepalli, Bajibabu
Juvela, Lauri
Alku, Paavo
INTERSPEECH 2019, 2019, : 2833 - 2837
[38] A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning
Padmakumar, Vishakh
Ranga, Rishab
Elluru, Srivalya
Kamath, Sowmya S.
PROCEEDINGS OF THE 2018 PACIFIC NEIGHBORHOOD CONSORTIUM ANNUAL CONFERENCE AND JOINT MEETINGS (PNC) - HUMAN RIGHTS IN CYBERSPACE, 2018, : 106 - 112
[39] Learning English vocabulary from word cards: A research synthesis
Lei, Yuanying
Reynolds, Barry Lee
FRONTIERS IN PSYCHOLOGY, 2022, 13
[40] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
Zhang, Mingyang
Zhou, Yi
Zhao, Li
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302

← 1 2 3 4 5 →