Open Vocabulary Keyword Spotting through Transfer Learning from Speech Synthesis

被引:0
|
作者
Kesavaraj, V [1 ]
Vuppala, Anil [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Speech Proc Lab, LTRC, Hyderabad, India
关键词
Transfer learning; Text-to-Speech; Keyword spotting; Tacotron; 2;
D O I
10.1109/SPCOM60851.2024.10631637
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting depend on a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different intermediate representations. The experimental results indicate that, in the challenging LibriPhrase Hard dataset, the proposed approach outperformed the cross-modality correspondence detector (CMCD) method by a significant improvement of 8.22% in area under the curve (AUC) and 12.56% in equal error rate (EER).
引用
收藏
页数:5
相关论文
共 50 条
  • [41] NON-NATIVE CHILDREN SPEECH RECOGNITION THROUGH TRANSFER LEARNING
    Matassoni, Marco
    Gretter, Roberto
    Falavigna, Daniele
    Giuliani, Diego
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6229 - 6233
  • [42] TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis
    Zhang, Jing-Xuan
    Richmond, Korin
    Ling, Zhen-Hua
    Dai, Li-Rong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14402 - 14410
  • [43] Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning
    Le, Thanh X.
    Le, An T.
    Nguyen, Quang H.
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (02): : 1263 - 1278
  • [44] Mongolian emotional speech synthesis based on transfer learning and emotional embedding
    Huang, Aihong
    Bao, Feilong
    Gao, Guanglai
    Shan, Yu
    Liu, Rui
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 78 - 83
  • [45] Feature reduction based transfer structural subspace learning for small-footprint cross-domain keyword spotting via linear discriminant analysis
    Ma, Fei
    Wang, Chengliang
    Hao, Yujie
    Wu, Xing
    DIGITAL SIGNAL PROCESSING, 2022, 127
  • [46] Context Synthesis Accelerates Vocabulary Learning Through Reading: The Implication of Distributional Semantic Theory on Second Language Vocabulary Research
    Wang-Kildegaard, Bowen
    Ji, Feng
    APPLIED LINGUISTICS, 2023, 45 (02) : 287 - 307
  • [47] Decoding Imagined Speech From EEG Using Transfer Learning
    Panachakel, Jerrin Thomas
    Ganesan, Ramakrishnan Angarai
    IEEE ACCESS, 2021, 9 : 135371 - 135383
  • [48] Transfer learning from English to Slovak in speech recognition applications
    Buday, Anton
    Juhar, Jozef
    Cizmar, Anton
    2023 33RD INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, RADIOELEKTRONIKA, 2023,
  • [49] Depression Symptom Identification Through Acoustic Speech Analysis: A Transfer Learning Approach
    Narayanrao, Purude Vaishali
    Kohirker, Kshiraja
    Preeth, Tadakamalla Shyam
    Kumari, P. Lalitha Surya
    TRAITEMENT DU SIGNAL, 2024, 41 (01) : 165 - 177
  • [50] Learning the lexicon from raw texts for open-vocabulary Korean word recognition
    Ryu, S
    Kim, JH
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 202 - 206