Spoken Term Detection Based on Feature Space Trajectory Information

被引：0

作者：

Tian Y.-H. ^{[1
]}

He Q.-H. ^{[1
]}

Zheng R.-W. ^{[1
]}

Wei Z. ^{[1
]}

Li Y.-X. ^{[1
]}

机构：

[1] South China University of Technology, Guangdong, Guangzhou

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 10期

基金：

中国国家自然科学基金;

关键词：

audio feature space; feature space trajectory information; limited-data source; spoken term detection;

D O I：

10.12263/DZXB.20220289

中图分类号：

学科分类号：

摘要：

The current technique of spoken term detection is dominated by deep learning, which requires large annotated data for training, and is difficult to be applied in limited-data scenarios. In this paper, a feature trajectory based method of spoken term detection is proposed for limited-data scenarios. The method originated from the fact that a word is a structured organization of small units such as syllable or phoneme and any language unit has steady statistical audio feature, based on the principle of physical location, feature distribution, temporal information of keywords, and local distinguishing information are constructed with speech examples. Spoken keywords are searched with the feature trajectory information of the detected speech segment in hierarchical decision strategy. The method works on a audio feature space defined by a identifier set trained with a large unlabeled speech dataset. Several experimental results show that the proposed method is evidently superior to HMM and CRNN when the training samples is less than 100. For example, when 10 samples are used for training, FRR and FAR of the propose method are absolutely decreased by 20.5% and 8.7 FP/hour respectively compared with HMM-based system. On the other hand, the proposed method achieved the comparable performance v.s. CRNN-based system when the training samples is more than 300. © 2023 Chinese Institute of Electronics. All rights reserved.

引用

下载

页码：2915 / 2924

页数：9

共 27 条

[21] TANG Y Z., Discriminant analysis based on euclidean distance-Research on iris classification, Modern Business Trade Industry, 40, 9, pp. 183-185, (2019)
[22] HE Q H, SU J B, YAN H K, Et al., Speech Syllable Number Estimation Method Based on Spectrogram Time Difference
[23] NOBLE W S., What is a support vector machine?, Nature Biotechnology, 24, 12, pp. 1565-1567, (2006)
[24] BREIMAN L., Bagging predictors, Machine Learning, 24, 2, pp. 123-140, (1996)
[25] BU H, DU J Y, NA X Y, Et al., AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pp. 1-5, (2018)
[26] DU J Y, NA X Y, LIU X C, Et al., AISHELL-2: Transforming mandarin ASR research into industrial scale [EB/OL]
[27] WARDEN P., Speech commands: A dataset for limited-vocabulary speech recognition

← 1 2 3 →