QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING ATTENTION-BASED MULTI-HOP NETWORKS

被引:0
|
作者
Ao, Chia-Wei [1 ]
Lee, Hung-yi [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
关键词
Attention-based Multi-hop Network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Retrieving spoken content with spoken queries, or query-by-example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of DTW, and it performs as well as DTW but with a lower run-time complexity.
引用
收藏
页码:6264 / 6268
页数:5
相关论文
共 50 条
  • [41] Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation
    Tejedor, Javier
    Toledano, Doroteo T.
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Penagarikano, Mikel
    Javier Rodriguez-Fuentes, Luis
    Moreno-Sandoval, Antonio
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
  • [42] Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation
    Javier Tejedor
    Doroteo T. Toledano
    Paula Lopez-Otero
    Laura Docio-Fernandez
    Mikel Peñagarikano
    Luis Javier Rodriguez-Fuentes
    Antonio Moreno-Sandoval
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [43] CNN based Query by Example Spoken Term Detection
    Ram, Dhananjay
    Miculicich, Lesly
    Bourlard, Herve
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 92 - 96
  • [44] Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices
    Zhu, Ziwei
    Wu, Zhiyong
    Li, Runnan
    Ning, Yishuang
    Meng, Helen
    ARTIFICIAL INTELLIGENCE AND MOBILE SERVICES - AIMS 2018, 2018, 10970 : 55 - 66
  • [45] Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
    Madhavi, Maulik C.
    Patil, Hemant A.
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 175 - 202
  • [46] DOUBLE-LAYER NEIGHBORHOOD GRAPH BASED SIMILARITY SEARCH FOR FAST QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
    Aoyama, Kazuo
    Ogawa, Atsunori
    Hattori, Takashi
    Hori, Takaaki
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5216 - 5220
  • [47] Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion
    Javier Tejedor
    Doroteo T Toledano
    Xavier Anguera
    Amparo Varona
    Lluís F Hurtado
    Antonio Miguel
    José Colás
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [48] Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection
    Ma, Murong
    Wu, Haiwei
    Wang, Xuyang
    Yang, Lin
    Wang, Junjie
    Li, Ming
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [49] Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion
    Tejedor, Javier
    Toledano, Doroteo T.
    Anguera, Xavier
    Varona, Amparo
    Hurtado, Lluis F.
    Miguel, Antonio
    Colas, Jose
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [50] Capturing Indian Phonemic Diversity with Multiple Posteriorgrams for Multilingual Query-by-Example Spoken Term Detection
    Popli, Abhimanyu
    Kumar, Arun
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,