Spoken Term Detection Based on Feature Space Trajectory Information

被引:0
|
作者
Tian Y.-H. [1 ]
He Q.-H. [1 ]
Zheng R.-W. [1 ]
Wei Z. [1 ]
Li Y.-X. [1 ]
机构
[1] South China University of Technology, Guangdong, Guangzhou
来源
基金
中国国家自然科学基金;
关键词
audio feature space; feature space trajectory information; limited-data source; spoken term detection;
D O I
10.12263/DZXB.20220289
中图分类号
学科分类号
摘要
The current technique of spoken term detection is dominated by deep learning, which requires large annotated data for training, and is difficult to be applied in limited-data scenarios. In this paper, a feature trajectory based method of spoken term detection is proposed for limited-data scenarios. The method originated from the fact that a word is a structured organization of small units such as syllable or phoneme and any language unit has steady statistical audio feature, based on the principle of physical location, feature distribution, temporal information of keywords, and local distinguishing information are constructed with speech examples. Spoken keywords are searched with the feature trajectory information of the detected speech segment in hierarchical decision strategy. The method works on a audio feature space defined by a identifier set trained with a large unlabeled speech dataset. Several experimental results show that the proposed method is evidently superior to HMM and CRNN when the training samples is less than 100. For example, when 10 samples are used for training, FRR and FAR of the propose method are absolutely decreased by 20.5% and 8.7 FP/hour respectively compared with HMM-based system. On the other hand, the proposed method achieved the comparable performance v.s. CRNN-based system when the training samples is more than 300. © 2023 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:2915 / 2924
页数:9
相关论文
共 27 条
  • [1] SANGEETHA J, JOTHILAKSHMI S., A novel spoken document retrieval system using auto associative neural network based keyword spotting, 2015 IEEE 9th International Conference on Intelligent Systems and Control (ISCO), pp. 1-6, (2015)
  • [2] LIU J H., Research and System Implementation of Speech Keyword Retrieval Method for Multilingual Massive Data, (2019)
  • [3] KAVYA H P, KARJIGI V., Sensitive keyword spotting for crime analysis, 2014 IEEE National Conference on Communication, Signal Processing and Networking (NCC-SN), pp. 1-6, (2015)
  • [4] CHEN G G, PARADA C, HEIGOLD G., Small-footprint keyword spotting using deep neural networks, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087-4091, (2014)
  • [5] MICHAELY A H, ZHANG X D, SIMKO G, Et al., Keyword spotting for Google assistant using contextual speech recognition, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 272-278, (2018)
  • [6] WEINTRAUB M., LVCSR log-likelihood ratio scoring for keyword spotting, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp. 297-300, (1995)
  • [7] ROSENBERG A, AUDHKHASI K, SETHY A, Et al., End-to-end speech recognition and keyword search on low-resource languages, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5280-5284, (2017)
  • [8] TANG H T, XUE J B, HAN J Q., A method of multi-scale forward attention model for speech recognition, Acta Electronica Sinica, 48, 7, pp. 1255-1260, (2020)
  • [9] ROSE R C, PAUL D B., A hidden Markov model based keyword recognition system, International Conference on Acoustics, Speech, and Signal Processing, pp. 129-132, (1990)
  • [10] ZHANG S L, SHUANG Z W, SHI Q, Et al., Improved mandarin keyword spotting using confusion garbage model, 2010 20th International Conference on Pattern Recognition, pp. 3700-3703, (2010)