Improved subword modeling for WFST-based speech recognition

被引：25

作者：

Smit, Peter ^{[1
]}

Virpioja, Sami ^{[1
]}

Kurimo, Mikko ^{[1
]}

机构：

[1] Aalto Univ, Dept Signal Proc & Acoust, Helsinki, Finland

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

芬兰科学院;

关键词：

speech recognition; Kaldi; subword modeling; Finnish; Estonian;

D O I：

10.21437/Interspeech.2017-103

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling. position-dependent phones. and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.

引用

页码：2551 / 2555

页数：5

共 50 条

[21] Subword unit based speech recognition in car environments
Fischer, A
Stahl, V
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 257 - 260
[22] Evaluation of a WFST-based ASR system for train timetable information
Department of Computer Science, Tokyo Institute of Technology, 152-8552 Tokyo, Japan
不详
APSIPA ASC - Asia-Pac. Signal Inf. Process. Assoc. Annu. Summit Conf., (648-651):
[23] STATISTICAL DIALOG MANAGEMENT APPLIED TO WFST-BASED DIALOG SYSTEMS
Hori, Chiori
Ohtake, Kiyonori
Misu, Teruhisa
Kashioka, Hideki
Nakamura, Satoshi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4793 - 4796
[24] A study on task-independent subword selection and modeling for speech recognition
Lee, CH
Juang, BH
Chou, W
MolinaPerez, JJ
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1820 - 1823
[25] A WFST-based Log-linear Framework for Speaking-style Transformation
Neubig, Graham
Mori, Shinsuke
Kawahara, Tatsuya
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1503 - 1506
[26] Lexicon Adaptation for Subword Speech Recognition
Mertens, Timo
Schneider, Daniel
Naess, Arild Brandrud
Svendsen, Torbjorn
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 562 - +
[27] Subword Speech Recognition for Agglutinative Languages
Valizada, Alakbar
2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
[28] SUBWORD-BASED LARGE-VOCABULARY SPEECH RECOGNITION
LEE, CH
GAUVAIN, JL
PIERACCINI, R
RABINER, LR
AT&T TECHNICAL JOURNAL, 1993, 72 (05): : 25 - 36
[29] Expansion of WFST-Based Dialog Management for Handling Multiple ASR Hypotheses
Kimura, Naoto
Hori, Chiori
Misu, Teruhisa
Ohtake, Kiyonori
Kawai, Hisashi
Nakamura, Satoshi
SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 61 - 72
[30] ACOUSTIC MODELING OF SUBWORD UNITS FOR LARGE VOCABULARY SPEAKER INDEPENDENT SPEECH RECOGNITION
LEE, CH
RABINER, LR
PIERACCINI, R
WILPON, JG
SPEECH AND NATURAL LANGUAGE, 1989, : 280 - 291

← 1 2 3 4 5 →