Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引：0

作者：

Kesavaraj, V ^{[1
]}

Vuppala, Anil ^{[1
]}

机构：

[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India

来源：

2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024 | 2024年

关键词：

Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;

D O I：

10.1109/SPCOM60851.2024.10631637

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).

引用

页数：5

共 50 条

[41] NON-NATIVE CHILDREN SPEECH RECOGNITION THROUGH TRANSFER LEARNING
Matassoni, Marco
Gretter, Roberto
Falavigna, Daniele
Giuliani, Diego
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6229 - 6233
[42] TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis
Zhang, Jing-Xuan
Richmond, Korin
Ling, Zhen-Hua
Dai, Li-Rong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14402 - 14410
[43] Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning
Le, Thanh X.
Le, An T.
Nguyen, Quang H.
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (02): : 1263 - 1278
[44] Mongolian emotional speech synthesis based on transfer learning and emotional embedding
Huang, Aihong
Bao, Feilong
Gao, Guanglai
Shan, Yu
Liu, Rui
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 78 - 83
[45] Feature reduction based transfer structural subspace learning for small-footprint cross-domain keyword spotting via linear discriminant analysis
Ma, Fei
Wang, Chengliang
Hao, Yujie
Wu, Xing
DIGITAL SIGNAL PROCESSING, 2022, 127
[46] Context Synthesis Accelerates Vocabulary Learning Through Reading: The Implication of Distributional Semantic Theory on Second Language Vocabulary Research
Wang-Kildegaard, Bowen
Ji, Feng
APPLIED LINGUISTICS, 2023, 45 (02) : 287 - 307
[47] Decoding Imagined Speech From EEG Using Transfer Learning
Panachakel, Jerrin Thomas
Ganesan, Ramakrishnan Angarai
IEEE ACCESS, 2021, 9 : 135371 - 135383
[48] Transfer learning from English to Slovak in speech recognition applications
Buday, Anton
Juhar, Jozef
Cizmar, Anton
2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
[49] Depression Symptom Identification Through Acoustic Speech Analysis: A Transfer Learning Approach
Narayanrao, Purude Vaishali
Kohirker, Kshiraja
Preeth, Tadakamalla Shyam
Kumari, P. Lalitha Surya
TRAITEMENT DU SIGNAL, 2024, 41 (01) : 165 - 177
[50] Learning the lexicon from raw texts for open-vocabulary Korean word recognition
Ryu, S
Kim, JH
SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 202 - 206

← 1 2 3 4 5 →