Improved subword modeling for WFST-based speech recognition

被引:25
|
作者
Smit, Peter [1 ]
Virpioja, Sami [1 ]
Kurimo, Mikko [1 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Helsinki, Finland
基金
芬兰科学院;
关键词
speech recognition; Kaldi; subword modeling; Finnish; Estonian;
D O I
10.21437/Interspeech.2017-103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling. position-dependent phones. and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.
引用
收藏
页码:2551 / 2555
页数:5
相关论文
共 50 条
  • [1] AN ASYNCHRONOUS WFST-BASED DECODER FOR AUTOMATIC SPEECH RECOGNITION
    Lv, Hang
    Chen, Zhehuai
    Xu, Hainan
    Povey, Daniel
    Xie, Lei
    Khudanpur, Sanjeev
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6019 - 6023
  • [2] A Comparative Study on Selecting Acoustic Modeling Units for WFST-based Mongolian Speech Recognition
    Wang Yonghe
    Bao, Feilong
    Gao, Gaunglai
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [3] Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition
    Novak, Josef R.
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1078 - 1081
  • [4] Tied-State Mixture Language Model for WFST-based Speech Recognition
    Yamamoto, Hitoshi
    Dixon, Paul R.
    Matsuda, Shigeki
    Hori, Chiori
    Kashioka, Hideki
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 174 - 177
  • [5] Compact and Efficient WFST-based Decoders for Handwriting Recognition
    Cai, Meng
    Huo, Qiang
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 143 - 148
  • [6] Large Vocabulary Continuous Speech Recognition Using WFST-based Linear Classifier for Structured Data
    Watanabe, Shinji
    Hori, Takaaki
    Nakamura, Atsushi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 346 - 349
  • [7] WFST-BASED STRUCTURAL CLASSIFICATION INTEGRATING DNN ACOUSTIC FEATURES AND RNN LANGUAGE FEATURES FOR SPEECH RECOGNITION
    Quoc Truong Do
    Nakamura, Satoshi
    Delcroix, Marc
    Hori, Takaaki
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4959 - 4963
  • [8] SILENCE IS GOLDEN: MODELING NON-SPEECH EVENTS IN WFST-BASED DYNAMIC NETWORK DECODERS
    Rybach, David
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4205 - 4208
  • [9] EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING
    Miao, Yajie
    Gowayyed, Mohammad
    Metze, Florian
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 167 - 174
  • [10] A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit
    Chong, Jike
    Gonina, Ekaterina
    Yi, Youngmin
    Keutzer, Kurt
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1187 - 1190