Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引:0
|
作者
Kesavaraj, V [1 ]
Vuppala, Anil [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India
关键词
Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;
D O I
10.1109/SPCOM60851.2024.10631637
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2047 - 2051
  • [32] OPEN VOCABULARY SPOKEN DOCUMENT RETRIEVAL BY SUBWORD SEQUENCE OBTAINED FROM SPEECH RECOGNIZER
    Kuriki, Go
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 301 - +
  • [33] Open set transfer learning through distribution driven active learning
    Wang, Min
    Wen, Ting
    Jiang, Xiao-Yu
    Zhang, An-An
    PATTERN RECOGNITION, 2024, 146
  • [34] Transfer of statistical learning from passive speech perception to speech production
    Murphy, Timothy K.
    Nozari, Nazbanou
    Holt, Lori L.
    PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (03) : 1193 - 1205
  • [35] Bridging Over from Learning Videos to Learning Resources Through Automatic Keyword Extraction
    Schulten, Cleo
    Manske, Sven
    Langner-Thiele, Angela
    Hoppe, H. Ulrich
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 382 - 386
  • [36] A multi-phase approach for fast spotting of large vocabulary Chinese keywords from Mandarin speech using prosodic information
    Bai, BR
    Tseng, CY
    Lee, LS
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 903 - 906
  • [37] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    INTERSPEECH 2019, 2019, : 2833 - 2837
  • [38] A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning
    Padmakumar, Vishakh
    Ranga, Rishab
    Elluru, Srivalya
    Kamath, Sowmya S.
    PROCEEDINGS OF THE 2018 PACIFIC NEIGHBORHOOD CONSORTIUM ANNUAL CONFERENCE AND JOINT MEETINGS (PNC) - HUMAN RIGHTS IN CYBERSPACE, 2018, : 106 - 112
  • [39] Learning English vocabulary from word cards: A research synthesis
    Lei, Yuanying
    Reynolds, Barry Lee
    FRONTIERS IN PSYCHOLOGY, 2022, 13
  • [40] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
    Zhang, Mingyang
    Zhou, Yi
    Zhao, Li
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302