QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING ATTENTION-BASED MULTI-HOP NETWORKS

被引:0
|
作者
Ao, Chia-Wei [1 ]
Lee, Hung-yi [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
关键词
Attention-based Multi-hop Network;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Retrieving spoken content with spoken queries, or query-by-example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of DTW, and it performs as well as DTW but with a lower run-time complexity.
引用
收藏
页码:6264 / 6268
页数:5
相关论文
共 50 条
  • [31] Attention-based Multi-hop Reasoning for Knowledge Graph
    Wang, Zikang
    Li, Linjing
    Zeng, Daniel Dajun
    Chen, Yue
    2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2018, : 211 - 213
  • [32] Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection
    Chen, Hongjie
    Leung, Chewing-Chi
    Xie, Lei
    Ma, Bin
    Lie, Haizhou
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 923 - 927
  • [33] UNSUPERVISED ACOUSTIC SUB-WORD UNIT DETECTION FOR QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
    Huijbregts, Marijn
    McLaren, Mitchell
    van Leeuwen, David
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4436 - 4439
  • [34] PAIRWISE LEARNING USING MULTI-LINGUAL BOTTLENECK FEATURES FOR LOW-RESOURCE QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
    Yuan, Yougen
    Leung, Cheung-Chi
    Xie, Lei
    Chen, Hongjie
    Ma, Bin
    Li, Haizhou
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5645 - 5649
  • [35] Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
    Zhu, Ziwei
    Wu, Zhiyong
    Li, Runnan
    Meng, Helen
    Cai, Lianhong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 102 - 106
  • [36] HIGH-PERFORMANCE QUERY-BY-EXAMPLE SPOKEN TERM DETECTION ON THE SWS 2013 EVALUATION
    Rodriguez-Fuentes, Luis J.
    Varona, Amparo
    Penagarikano, Mikel
    Bordel, German
    Diez, Mireia
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [37] Combined MFCC-FBCC Features for Unsupervised Query-by-Example Spoken Term Detection
    Vasudev, Drisya
    Vasudev, Suryakanth V.
    Babu, K. K. Anish
    Riyas, K. S.
    INTELLIGENT SYSTEMS TECHNOLOGIES AND APPLICATIONS, VOL 1, 2016, 384 : 511 - 519
  • [38] Multitask Feature Learning for Low-Resource Query-by-Example Spoken Term Detection
    Chen, Hongjie
    Leung, Cheung-Chi
    Xie, Lei
    Ma, Bin
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1329 - 1339
  • [39] PHONETIC UNIT SELECTION FOR CROSS-LINGUAL QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
    Lopez-Otero, Paula
    Docio-Fernandez, Laura
    Garcia-Mateo, Carmen
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 223 - 229
  • [40] A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL Learners' Speech
    Hou, Jingyong
    Hu, Wenping
    Soong, Frank K.
    Xie, Lei
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 111 - 115