A language model adaptation using multiple varied corpora

被引:0
|
作者
Yamamoto, H [1 ]
Sagisaka, Y [1 ]
机构
[1] ATR, Spoken Language Translat Res Labs, Kyoto 6190288, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data regarded as different features in this clustering. All words are classified into commonly use word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.
引用
收藏
页码:389 / 392
页数:4
相关论文
共 50 条
  • [41] Teaching and language corpora
    de Haan, P
    [J]. APPLIED LINGUISTICS, 1998, 19 (04) : 542 - 547
  • [42] Corpora and language learners
    Oostdijk, Nelleke
    [J]. ENGLISH STUDIES, 2007, 88 (03) : 368 - 369
  • [43] Corpora and language on the net
    Gerstenberg, Annette
    [J]. ROMANISCHE FORSCHUNGEN, 2011, 123 (01) : 90 - 91
  • [44] Corpora and Language Teaching
    Buysse, Lieven
    [J]. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2009, 14 (04) : 549 - 556
  • [45] Signed Language Corpora
    Mesch, Johanna
    [J]. JOURNAL OF LINGUISTICS, 2024,
  • [46] Overcoming the sparseness problem of spoken language corpora using other large corpora of distinct characteristics
    Cho, SY
    Kim, SH
    Park, J
    Lee, YJ
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 407 - 411
  • [47] Corpora of the Russian Language
    Zakharov, Victor
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 1 - 13
  • [48] Corpora and language teaching
    Aston, G
    [J]. RETHINKING LANGUAGE PEDAGOGY FROM A CORPUS PERSPECTIVE, 2000, 2 : 7 - 17
  • [49] Unsupervised adaptation of a stochastic Language Model using a Japanese raw corpus
    Kurata, Gakuto
    Mori, Shinsuke
    Nishimura, Masafumi
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1037 - 1040
  • [50] Corpora and Language Education
    Pauwels, Paul
    [J]. ENGLISH TEXT CONSTRUCTION, 2013, 6 (02) : 306 - 309