Large vocabulary speech recognition of Slovenian language using morphological models

被引：0

作者：

Maucec, M ^{[1
]}

Rotovnik, T ^{[1
]}

Kacic, Z ^{[1
]}

Horvat, B ^{[1
]}

机构：

[1] Univ Maribor, Inst Elect, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia

来源：

IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL | 2003年

关键词：

language modelling; automatic continuous speech recognition; morphology; large vocabulary; data-driven methods; topic adaptation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper concerns the development of automatic speech recognition system for Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses the statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modelling the Slovenian language. We as well discuss corpus-based topic-adapted language models. Language models are most often used in topic homogeneous environment. The problem of topic detection in highly inflected language is outlined, caused by appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.

引用

页码：158 / 161

页数：4

共 50 条

[11] Boosting acoustic models in large vocabulary speech recognition
Meyer, C
Schramm, H
[J]. PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260
[12] Language identification through large vocabulary continous speech recognition
Lim, BP
Li, HZ
Chen, Y
[J]. 2004 INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2004, : 49 - 52
[13] A large vocabulary continuous speech recognition system for Persian language
Sameti, Hossein
Veisi, Hadi
Bahrani, Mohammad
Babaali, Bagher
Hosseinzadeh, Khosro
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
[14] A large vocabulary continuous speech recognition system for Persian language
Hossein Sameti
Hadi Veisi
Mohammad Bahrani
Bagher Babaali
Khosro Hosseinzadeh
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
[15] A multispan language modeling framework for large vocabulary speech recognition
Bellegarda, JR
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 456 - 467
[16] LARGE-VOCABULARY SPEECH RECOGNITION - A SYSTEM FOR THE ITALIAN LANGUAGE
DORTA, P
FERRETTI, M
MARTELLI, A
MELECRINIS, S
SCARCI, S
VOLPI, G
[J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1988, 32 (02) : 217 - 226
[17] Connectionist language modeling for large vocabulary continuous speech recognition
Schwenk, H
Gauvain, JL
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 765 - 768
[18] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
Karpov, Alexey
Markov, Konstantin
Kipyatkova, Irina
Vazhenina, Dania
Ronzhin, Andrey
[J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
[19] Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Masumura, Ryo
Hahm, Seongjun
Ito, Akinori
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1476 - 1479
[20] Building DNN acoustic models for large vocabulary speech recognition
Maas, Andrew L.
Qi, Peng
Xie, Ziang
Hannun, Awni Y.
Lengerich, Christopher T.
Jurafsky, Daniel
Ng, Andrew Y.
[J]. COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213

← 1 2 3 4 5 →