Discriminative keyword spotting using triphones information and N-best search

被引:7
|
作者
Tabibian, Shima [1 ]
Akbari, Ahmad [2 ]
Nasersharif, Babak [2 ,3 ]
机构
[1] Minist Sci Res & Technol, Aerosp Res Inst, Tehran 14665834, Iran
[2] Iran Univ Sci & Technol, Comp Engn Dept, Audio & Speech Proc Lab, Tehran, Iran
[3] KN Toosi Univ Technol, Comp Engn Dept, Tehran, Iran
关键词
Discriminative keyword spotting; Hidden Markov model; Phone recognizer; Triphone; One-best search; N-best search; HIDDEN-MARKOV-MODELS; CONFIDENCE MEASURES; PHONEME; ERROR; RECOGNITION; ALGORITHM;
D O I
10.1016/j.ins.2017.09.052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyword Spotting (KWS) systems can be divided into two main groups: Hidden Markov Model (HMM)-based and Discriminative KWS (DKWS) systems. In this paper, we propose an approach to improve a DKWS system using advantages of HMM-based systems. The proposed DKWS system contains feature extraction and classification (that includes a classifier and a search algorithm) parts. The focus of this paper is on the feature extraction part and the search algorithm. At first, we propose a method for using the advantages of a triphone-based HMM system and improving the monophone-based feature extraction, (proposed in our previous works), to triphone-based one. Then, we propose an N-best search algorithm instead of one-best algorithm. The results on TIMIT database indicate that the true detection rate of the triphone-based Evolutionary DKWS (EDKWS) system with N-best search (Tph-EDKWS-N-Best), in false alarm rate per keyword per hour greater than two, is 4.6% higher than that of the monophone-based EDKWS system with one-best search (Mph-EDKWS-1-Best). This improvement costs about 0.4 unit degradation in Real Time Factor (a common metric of measuring the speed of an automatic speech recognition system). Additionally, Figure of Merit (average true detection rate for different false alarm per keyword per hour from 1 to 10) of the Tph-EDKWS-N-Best system is noticeably higher than that of HMM-based KWS systems. However, the computational complexity of the Tph-EDKWS-N-Best system is considerably higher than that of the HMM-based KWS systems. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:157 / 171
页数:15
相关论文
共 50 条
  • [1] A fast hierarchical search algorithm for discriminative keyword spotting
    Tabibian, Shima
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. INFORMATION SCIENCES, 2016, 336 : 45 - 59
  • [2] Character confidence based on N-best list for keyword spotting in online Chinese handwritten documents
    Zhang, Heng
    Wang, Da-Han
    Liu, Cheng-Lin
    [J]. PATTERN RECOGNITION, 2014, 47 (05) : 1880 - 1890
  • [3] DISCRIMINATIVE RECOGNITION RATE ESTIMATION FOR N-BEST LIST AND ITS APPLICATION TO N-BEST RESCORING
    Ogawa, Atsunori
    Hori, Takaaki
    Nakamura, Atsushi
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6832 - 6836
  • [4] HALLUCINATED N-BEST LISTS FOR DISCRIMINATIVE LANGUAGE MODELING
    Sagae, K.
    Lehr, M.
    Prud'hommeaux, E.
    Xu, P.
    Glenn, N.
    Karakos, D.
    Khudanpur, S.
    Roark, B.
    Saraclar, M.
    Shafran, I.
    Bikel, D.
    Callison-Burch, C.
    Cao, Y.
    Hall, K.
    Hasler, E.
    Koehn, P.
    Lopez, A.
    Post, M.
    Rileyh, D.
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5001 - 5004
  • [5] DISCRIMINATIVE LEARNING USING LINGUISTIC FEATURES TO RESCORE N-BEST SPEECH HYPOTHESES
    Georgescul, Maria
    Rayner, Manny
    Bouillon, Pierrette
    Tsourakis, Nikos
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 97 - 100
  • [6] USING N-BEST RECOGNITION OUTPUT FOR EXTRACTIVE SUMMARIZATION AND KEYWORD EXTRACTION IN MEETING SPEECH
    Liu, Yang
    Xie, Shasha
    Liu, Fei
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5310 - 5313
  • [7] NEURAL ORACLE SEARCH ON N-BEST HYPOTHESES
    Variani, Ehsan
    Chen, Tongzhou
    Apfel, James
    Ramabhadran, Bhuvana
    Lee, Seungji
    Moreno, Pedro
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7824 - 7828
  • [8] Improvement in N-best search for continuous speech recognition
    Illina, I
    Gong, YF
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2147 - 2150
  • [9] Keyword spotting using an evolutionary-based classifier and discriminative features
    Tabibian, Shima
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (07) : 1660 - 1670
  • [10] Utterance verification using search confusion rate and its N-best approach
    Kim, K
    Kim, H
    Hahn, M
    [J]. ETRI JOURNAL, 2005, 27 (04) : 461 - 464