N-Gram FST Indexing for Spoken Term Detection

被引:0
|
作者
Liu, Chao [1 ]
Wang, Dong [1 ]
Tejedor, Javier
机构
[1] Tsinghua Univ, Ctr Speech & Language Technol, Beijing, Peoples R China
关键词
spoken term indexing; finite state transducer; spoken term detection; speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confidence measuring and therefore may result in better performance than searching within the original lattices or the equivalent FSTs. In this paper we present an n-gram FST indexing approach which combines the flexibility of n-gram indexing and the efficiency of FST indexing. Specifically, we employ the n-gram indexing to relax connectivity in original lattices and then formalize the indices into an FST for online search. We demonstrate this approach with a phone-based STD task where the lattice is sparse due to strong language models. The results show that n-gram FST indexing provides not only better detection performance than lattice search, but also a faster detection than both conventional n-gram and FST indexing.
引用
收藏
页码:2091 / 2094
页数:4
相关论文
共 50 条
  • [41] Using N-Gram Variations in Static Analysis for Malware Detection
    Radovancovici, Marco
    Galis, Darius
    Pungila, Ciprian
    2022 24TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, SYNASC, 2022, : 195 - 199
  • [42] N-gram similarity and distance
    Kondrak, Grzegorz
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2005, 3772 : 115 - 126
  • [43] XSS Attack Detection With Machine Learning and n-Gram Methods
    Habibi, Gulit
    Surantha, Nico
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND TECHNOLOGY (ICIMTECH), 2020, : 516 - 520
  • [44] BIGRAM VS N-GRAM
    HALPIN, P
    BYTE, 1988, 13 (08): : 26 - 26
  • [45] Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing
    Vishwakarma, Santosh K.
    Lakhtaria, Karna Ljit I.
    Bhatnagar, Divya
    Sharma, Akhilesh K.
    3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 815 - 820
  • [46] Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4933 - 4936
  • [47] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
    Fohr, Dominique
    Mella, Odile
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235
  • [48] An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer's Disease from Spoken Language
    Wankerl, Sebastian
    Noth, Elmar
    Evert, Stefan
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3162 - 3166
  • [49] n-gram Effect in Malware Detection Using Multilayer Perceptron (MLP)
    Purnama, Benni
    Stiawan, Deris
    Hanapi, Darmawijoyo
    Winanto, Eko Arip
    Budiarto, Rahmat
    Bin Idris, Mohd Yazid
    2021 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTERSCIENCE AND INFORMATICS (EECSI) 2021, 2021, : 45 - 49
  • [50] Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection
    Kuta, Marcin
    Kitowski, Jacek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2014, PT II, 2014, 8468 : 500 - 511