Improved subword modeling for WFST-based speech recognition

被引:25
|
作者
Smit, Peter [1 ]
Virpioja, Sami [1 ]
Kurimo, Mikko [1 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, Helsinki, Finland
基金
芬兰科学院;
关键词
speech recognition; Kaldi; subword modeling; Finnish; Estonian;
D O I
10.21437/Interspeech.2017-103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling. position-dependent phones. and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.
引用
收藏
页码:2551 / 2555
页数:5
相关论文
共 50 条
  • [21] Subword unit based speech recognition in car environments
    Fischer, A
    Stahl, V
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 257 - 260
  • [22] Evaluation of a WFST-based ASR system for train timetable information
    Department of Computer Science, Tokyo Institute of Technology, 152-8552 Tokyo, Japan
    不详
    APSIPA ASC - Asia-Pac. Signal Inf. Process. Assoc. Annu. Summit Conf., (648-651):
  • [23] STATISTICAL DIALOG MANAGEMENT APPLIED TO WFST-BASED DIALOG SYSTEMS
    Hori, Chiori
    Ohtake, Kiyonori
    Misu, Teruhisa
    Kashioka, Hideki
    Nakamura, Satoshi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4793 - 4796
  • [24] A study on task-independent subword selection and modeling for speech recognition
    Lee, CH
    Juang, BH
    Chou, W
    MolinaPerez, JJ
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1820 - 1823
  • [25] A WFST-based Log-linear Framework for Speaking-style Transformation
    Neubig, Graham
    Mori, Shinsuke
    Kawahara, Tatsuya
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1503 - 1506
  • [26] Lexicon Adaptation for Subword Speech Recognition
    Mertens, Timo
    Schneider, Daniel
    Naess, Arild Brandrud
    Svendsen, Torbjorn
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 562 - +
  • [27] Subword Speech Recognition for Agglutinative Languages
    Valizada, Alakbar
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [28] SUBWORD-BASED LARGE-VOCABULARY SPEECH RECOGNITION
    LEE, CH
    GAUVAIN, JL
    PIERACCINI, R
    RABINER, LR
    AT&T TECHNICAL JOURNAL, 1993, 72 (05): : 25 - 36
  • [29] Expansion of WFST-Based Dialog Management for Handling Multiple ASR Hypotheses
    Kimura, Naoto
    Hori, Chiori
    Misu, Teruhisa
    Ohtake, Kiyonori
    Kawai, Hisashi
    Nakamura, Satoshi
    SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 61 - 72
  • [30] ACOUSTIC MODELING OF SUBWORD UNITS FOR LARGE VOCABULARY SPEAKER INDEPENDENT SPEECH RECOGNITION
    LEE, CH
    RABINER, LR
    PIERACCINI, R
    WILPON, JG
    SPEECH AND NATURAL LANGUAGE, 1989, : 280 - 291