QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING ATTENTION-BASED MULTI-HOP NETWORKS

被引：0

作者：

Ao, Chia-Wei ^{[1
]}

Lee, Hung-yi ^{[1
]}

机构：

[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Attention-based Multi-hop Network;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Retrieving spoken content with spoken queries, or query-by-example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of DTW, and it performs as well as DTW but with a lower run-time complexity.

引用

页码：6264 / 6268

页数：5

共 50 条

[41] Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation
Tejedor, Javier
Toledano, Doroteo T.
Lopez-Otero, Paula
Docio-Fernandez, Laura
Penagarikano, Mikel
Javier Rodriguez-Fuentes, Luis
Moreno-Sandoval, Antonio
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (1)
[42] Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation
Javier Tejedor
Doroteo T. Toledano
Paula Lopez-Otero
Laura Docio-Fernandez
Mikel Peñagarikano
Luis Javier Rodriguez-Fuentes
Antonio Moreno-Sandoval
EURASIP Journal on Audio, Speech, and Music Processing, 2019
[43] CNN based Query by Example Spoken Term Detection
Ram, Dhananjay
Miculicich, Lesly
Bourlard, Herve
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 92 - 96
[44] Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices
Zhu, Ziwei
Wu, Zhiyong
Li, Runnan
Ning, Yishuang
Meng, Helen
ARTIFICIAL INTELLIGENCE AND MOBILE SERVICES - AIMS 2018, 2018, 10970 : 55 - 66
[45] Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
Madhavi, Maulik C.
Patil, Hemant A.
COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 175 - 202
[46] DOUBLE-LAYER NEIGHBORHOOD GRAPH BASED SIMILARITY SEARCH FOR FAST QUERY-BY-EXAMPLE SPOKEN TERM DETECTION
Aoyama, Kazuo
Ogawa, Atsunori
Hattori, Takashi
Hori, Takaaki
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5216 - 5220
[47] Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion
Javier Tejedor
Doroteo T Toledano
Xavier Anguera
Amparo Varona
Lluís F Hurtado
Antonio Miguel
José Colás
EURASIP Journal on Audio, Speech, and Music Processing, 2013
[48] Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection
Ma, Murong
Wu, Haiwei
Wang, Xuyang
Yang, Lin
Wang, Junjie
Li, Ming
2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
[49] Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion
Tejedor, Javier
Toledano, Doroteo T.
Anguera, Xavier
Varona, Amparo
Hurtado, Lluis F.
Miguel, Antonio
Colas, Jose
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[50] Capturing Indian Phonemic Diversity with Multiple Posteriorgrams for Multilingual Query-by-Example Spoken Term Detection
Popli, Abhimanyu
Kumar, Arun
2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,

← 1 2 3 4 5 →