A fast hierarchical search algorithm for discriminative keyword spotting

被引:5
|
作者
Tabibian, Shima [1 ,2 ]
Akbari, Ahmad [1 ,4 ]
Nasersharif, Babak [3 ]
机构
[1] Iran Univ Sci & Technol, Dept Comp Engn, Audio & Speech Proc Lab, Tehran 1465774111, Iran
[2] Minist Sci Res & Technol, Aerosp Res Inst, Tehran 14665834, Iran
[3] KN Toosi Univ Technol, Dept Comp Engn, Tehran, Iran
[4] Aerosp Res Inst, Aerosp Res Inst Lane,Mahestan St,Iran Zamin St, Tehran 14665834, Iran
关键词
Discriminative keyword spotting; Phone-based search; Modified Viterbi search; Hierarchical search;
D O I
10.1016/j.ins.2015.12.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A keyword spotter can be considered as a binary classifier which classifies a set of uttered sentences into two groups on the basis of whether they contain target keywords or not. For this classification task, the keyword spotter needs to identify the target keywords locations based on a fast and accurate search algorithm. In our previous works, we exploited a modified Viterbi (M-Viterbi) search algorithm which has two known drawbacks. First, to locate the target keywords, it runs an exhaustive search through all possible segments of input speech. Second, while computing the start and end time-frames of each new phone, it makes the keyword spotter to trace-back and re-evaluate the timing alignments of all previous one(s), despite the fact that very limited amount of data -if any- would get updated as a result. These two pitfalls cause a dramatically enlarged search space as well as a significant increase in computational complexity. In this paper, we propose a Hierarchical Search (H-Search) algorithm which allows the system to ignore some segments of input speech at each level of hierarchy, according to their lower likelihood of containing the target keywords. In addition, unlike the M-Viterbi algorithm, the H-Search algorithm does not demand repeated evaluations when computing the current phone alignment which, in turn, results in a narrowed-down search space (O(TP) versus O(TPLmax) - where T is number of frames, P is number of keyword phones and Lmax is the maximum phone duration) as well as a decreased computational complexity (O(TPLmax) versus O(TPLmax3)) compared to those of the M-Viterbi algorithm. We applied the H-Search algorithm to the classification part of an Evolutionary Discriminative Keyword Spotting (EDKWS) system introduced in our previous works. The experimental results indicate that the H-Search algorithm is executed 100 times faster than the M-Viterbi algorithm while the performance of the EDKWS system degrades no more than two percent compared to that of the M-Viterbi algorithm. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:45 / 59
页数:15
相关论文
共 50 条
  • [1] Discriminative keyword spotting
    Keshet, Joseph
    Grangier, David
    Bengio, Samy
    [J]. SPEECH COMMUNICATION, 2009, 51 (04) : 317 - 329
  • [2] A Fast Algorithm for Large Vocabulary Keyword Spotting Application
    Huang, Eng-Fong
    Wang, Hsiao-Chuan
    Soong, Frank K.
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 449 - 452
  • [3] Discriminative keyword spotting using triphones information and N-best search
    Tabibian, Shima
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. INFORMATION SCIENCES, 2018, 423 : 157 - 171
  • [4] A survey on structured discriminative spoken keyword spotting
    Tabibian, Shima
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (04) : 2483 - 2520
  • [5] A survey on structured discriminative spoken keyword spotting
    Shima Tabibian
    [J]. Artificial Intelligence Review, 2020, 53 : 2483 - 2520
  • [6] HMM based fast keyword spotting algorithm with no garbage models
    Sunil, S
    Palit, S
    Sreenivas, TV
    [J]. ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1020 - 1023
  • [7] New search algorithm for spotting keyword embedded in unconstrained spontaneous speech
    Dai, Lirong
    Wang, Renhua
    [J]. 1997, (10):
  • [8] An application of recurrent neural networks to discriminative keyword spotting
    Fernandez, Santiago
    Graves, Alex
    Schmidhuber, Juergen
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 2, PROCEEDINGS, 2007, 4669 : 220 - +
  • [9] A Fast Fuzzy Keyword Spotting Algorithm Based on Syllable Confusion Network
    Shao, Jian
    Zhao, Qingwei
    Zhang, Pengyuan
    Liu, Zhaojie
    Yan, Yonghong
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1665 - 1668
  • [10] Discriminative Keyword Spotting for limited-data applications
    Benisty, Hadas
    Katz, Itamar
    Crammer, Koby
    Malah, David
    [J]. SPEECH COMMUNICATION, 2018, 99 : 1 - 11