Combination of diverse subword units in spoken term detection

被引：0

作者：

Lee, Shi-wook ^{[1
]}

Tanaka, Kazuyo ^{[2
]}

Itoh, Yoshiaki ^{[3
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan

[2] Univ Tsukuba, Tsukuba, Ibaraki 305, Japan

[3] Iwate Prefectural Univ, Takizawa, Iwate, Japan

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

spoken term detection; keyword search; system combination; phonetic recognition; diversity;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the following two points: First, we try to clarify the effect of combination systems from two aspects, accuracy and heterogeneity. And then we evaluate our unique subword unit, called Sub-Phonetic Segment (SPS) to maximize performance improvement by combination. Combination systems usually yield higher performance than any individual system. When the systems being combined are individually accurate but also mutually heterogeneous, the improvement by combination can be maximized. From this consideration, we estimate heterogeneity by correlation of false alarm errors of combined systems and confirm that lower correlation of two systems yields the better performance improvement by combination. Comparative tests of several combination approaches are carried out on subword-based spoken term detection. Since subword-based systems use constrained linguistic knowledge, it is fairly straightforward to verify the heterogeneity of combined systems. Experimental results show that the most significant improvements can be achieved by combination of two different subword units, triphone and SPS, which are highly heterogeneous subword units with low correlation of false alarm detections.

引用

页码：3685 / 3689

页数：5

共 50 条

[41] A novel approach for spoken term detection in Vietnamese
Nguyen Hong Quang
Trinh Van Loan
Le Xuan Thanh
2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, MANAGEMENT AND TELECOMMUNICATIONS (COMMANTEL), 2015, : 68 - 72
[42] AN ITERATIVE DEEP LEARNING FRAMEWORK FOR UNSUPERVISED DISCOVERY OF SPEECH FEATURES AND LINGUISTIC UNITS WITH APPLICATIONS ON SPOKEN TERM DETECTION
Chung, Cheng-Tao
Tsai, Cheng-Yu
Lu, Hsiang-Hung
Liu, Chia-Hsiang
Lee, Hung-yi
Lee, Lin-Shan
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 245 - 251
[43] Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification
Toyohashi University of Technology, Japan
IEEE Workshop Autom. Speech Recognit. Underst., ASRU - Proc., 2015, (200-206):
[44] COMBINATION OF SYLLABLE BASED N-GRAM SEARCH AND WORD SEARCH FOR SPOKEN TERM DETECTION THROUGH SPOKEN QUERIES AND IV/OOV CLASSIFICATION
Sakamoto, Nagisa
Yamamoto, Kazumasa
Nakagawa, Seiichi
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 200 - 206
[45] Spoken term detection for Turkish Broadcast News
Parlak, Siddika
Saraclar, Murat
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5244 - 5247
[46] English Spoken Term Detection in Multilingual Recordings
Motlicek, Petr
Valente, Fabio
Garner, Philip N.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 206 - 209
[47] Recent developments in spoken term detection: a survey
Mandal, Anupam
Kumar, K.
Mitra, Pabitra
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (02) : 183 - 198
[48] ORDER-FREE SPOKEN TERM DETECTION
Mangu, Lidia
Saon, George
Picheny, Michael
Kingsbury, Brian
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5331 - 5335
[49] Incorporating visual information for spoken term detection
Kalantari, Shahram
Dean, David
Sridharan, Sridha
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 558 - 562
[50] Stochastic Pronunciation Modelling for Spoken Term Detection
Wang, Dong
King, Simon
Frankel, Joe
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2091 - 2094

← 1 2 3 4 5 →