Improved subword modeling for WFST-based speech recognition

被引：25

作者：

Smit, Peter ^{[1
]}

Virpioja, Sami ^{[1
]}

Kurimo, Mikko ^{[1
]}

机构：

[1] Aalto Univ, Dept Signal Proc & Acoust, Helsinki, Finland

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

芬兰科学院;

关键词：

speech recognition; Kaldi; subword modeling; Finnish; Estonian;

D O I：

10.21437/Interspeech.2017-103

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling. position-dependent phones. and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.

引用

页码：2551 / 2555

页数：5

共 50 条

[1] AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION
Lv, Hang
Chen, Zhehuai
Xu, Hainan
Povey, Daniel
Xie, Lei
Khudanpur, Sanjeev
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6019 - 6023
[2] A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
Wang Yonghe
Bao, Feilong
Gao, Gaunglai
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
[3] Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition
Novak, Josef R.
Minematsu, Nobuaki
Hirose, Keikichi
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1078 - 1081
[4] Tied-State Mixture Language Model for WFST-based Speech Recognition
Yamamoto, Hitoshi
Dixon, Paul R.
Matsuda, Shigeki
Hori, Chiori
Kashioka, Hideki
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 174 - 177
[5] Compact and Efficient WFST-based Decoders for Handwriting Recognition
Cai, Meng
Huo, Qiang
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 143 - 148
[6] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
Watanabe, Shinji
Hori, Takaaki
Nakamura, Atsushi
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349
[7] WFST-BASED STRUCTURAL CLASSIFICATION INTEGRATING DNN ACOUSTIC FEATURES AND RNN LANGUAGE FEATURES FOR SPEECH RECOGNITION
Quoc Truong Do
Nakamura, Satoshi
Delcroix, Marc
Hori, Takaaki
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4959 - 4963
[8] SILENCE IS GOLDEN: MODELING NON-SPEECH EVENTS IN WFST-BASED DYNAMIC NETWORK DECODERS
Rybach, David
Schlueter, Ralf
Ney, Hermann
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4205 - 4208
[9] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
Miao, Yajie
Gowayyed, Mohammad
Metze, Florian
[J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
[10] A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit
Chong, Jike
Gonina, Ekaterina
Yi, Youngmin
Keutzer, Kurt
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1187 - 1190

← 1 2 3 4 5 →