Combination of diverse subword units in spoken term detection

被引：0

作者：

Lee, Shi-wook ^{[1
]}

Tanaka, Kazuyo ^{[2
]}

Itoh, Yoshiaki ^{[3
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan

[2] Univ Tsukuba, Tsukuba, Ibaraki 305, Japan

[3] Iwate Prefectural Univ, Takizawa, Iwate, Japan

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

spoken term detection; keyword search; system combination; phonetic recognition; diversity;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the following two points: First, we try to clarify the effect of combination systems from two aspects, accuracy and heterogeneity. And then we evaluate our unique subword unit, called Sub-Phonetic Segment (SPS) to maximize performance improvement by combination. Combination systems usually yield higher performance than any individual system. When the systems being combined are individually accurate but also mutually heterogeneous, the improvement by combination can be maximized. From this consideration, we estimate heterogeneity by correlation of false alarm errors of combined systems and confirm that lower correlation of two systems yields the better performance improvement by combination. Comparative tests of several combination approaches are carried out on subword-based spoken term detection. Since subword-based systems use constrained linguistic knowledge, it is fairly straightforward to verify the heterogeneity of combined systems. Experimental results show that the most significant improvements can be achieved by combination of two different subword units, triphone and SPS, which are highly heterogeneous subword units with low correlation of false alarm detections.

引用

页码：3685 / 3689

页数：5

共 50 条

[31] Semantically Expanded Spoken Term Detection
Kozhirbayev, Zhanibek
Yessenbayev, Zhandos
IEEE ACCESS, 2024, 12 : 177844 - 177855
[32] Multilingual spoken term detection: a review
Deekshitha, G.
Mary, Leena
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 653 - 667
[33] USING PARALLEL TOKENIZERS WITH DTW MATRIX COMBINATION FOR LOW-RESOURCE SPOKEN TERM DETECTION
Wang, Haipeng
Lee, Tan
Leung, Cheung-Chi
Ma, Bin
Li, Haizhou
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8545 - 8549
[34] EFFICIENT SYSTEM COMBINATION FOR SYLLABLE-CONFUSION-NETWORK-BASED CHINESE SPOKEN TERM DETECTION
Gao, Jie
Zhao, Qingwei
Yan, Yonghong
Shao, Jian
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 366 - 369
[35] ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION
Torbati, Amir Hossein Harati Nejad
Picone, Joe
2013 IEEE INTERNATIONAL MULTI-DISCIPLINARY CONFERENCE ON COGNITIVE METHODS IN SITUATION AWARENESS AND DECISION SUPPORT (COGSIMA), 2013, : 114 - 117
[36] Open-Vocabulary Spoken Document Retrieval based on new subword models and subword phonetic similarity
Iwata, Kohei
Itoh, Yoshiaki
Kojima, Kazunori
Ishigame, Masaaki
Tanaka, Kazuyo
Lee, Shi-wook
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 325 - +
[37] Model-Based Unsupervised Spoken Term Detection with Spoken Queries
Chan, Chun-an
Lee, Lin-shan
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (07): : 1330 - 1342
[38] Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing
Pan, Yi-cheng
Chang, Hung-lin
Lee, Lin-shan
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 677 - 682
[39] Subword and Crossword Units for CTC Acoustic Models
Zenkel, Thomas
Sanabria, Ramon
Metze, Florian
Waibel, Alex
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 396 - 400
[40] Robust Spoken Term Detection Using Combination of Phone-Based and Word-Based Recognition
Iwata, Kenji
Shinoda, Koichi
Furui, Sadaoki
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2195 - 2198

← 1 2 3 4 5 →