A language model adaptation using multiple varied corpora

被引:0
|
作者
Yamamoto, H [1 ]
Sagisaka, Y [1 ]
机构
[1] ATR, Spoken Language Translat Res Labs, Kyoto 6190288, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A new language model adaptation scheme is proposed to cope with multiple varied speech recognition tasks. Both topic difference and sentence style difference resulting from the speaker's role are reflected in the proposed language model adaptation. An adaptation is carried out using two different language corpora where only the topic or speaker's style is matched. New word clustering techniques are introduced to extract the topic or style dependency separately. Word neighboring characteristics in the two adaptation source data regarded as different features in this clustering. All words are classified into commonly use word classes and topic or style dependent classes. Furthermore, target topic and sentence style dependent words and their neighboring characteristics are emphasized according to their frequency in the adaptation target data. In the evaluation experiment, the proposed method shows a 13% lower perplexity and a 9% lower word error rate in continuous speech recognition compared with the conventional adaptation method.
引用
收藏
页码:389 / 392
页数:4
相关论文
共 50 条
  • [1] Language Model Adaptation for Tiny Adaptation Corpora
    Klakow, Dietrich
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2214 - 2217
  • [2] A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora
    de Oliveira, Aillkeen Bezerra
    Baptista, Claudio de Souza
    Firmino, Anderson Almeida
    de Paiva, Anselmo Cardoso
    [J]. 39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1461 - 1468
  • [3] Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora
    Selvam, M.
    Natarajan, A. M.
    [J]. CURRENT SCIENCE, 2010, 98 (07): : 922 - 929
  • [4] Minimum Discrimination Information-based Language Model Adaptation Using Tiny Domain Corpora for Intelligent Personal Assistants
    Jang, Gil-Jin
    Kim, Saejoon
    Kim, Ji-Hwan
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (04) : 1359 - 1365
  • [5] Using Corpora in the Language Classroom
    Scheffler, Pawe
    [J]. ELT JOURNAL, 2011, 65 (03) : 348 - U2
  • [6] Using Corpora in the Language Classroom
    Mull, Jacqueline
    [J]. LANGUAGE LEARNING & TECHNOLOGY, 2012, 16 (03): : 49 - 52
  • [7] Using corpora for language research
    Lewis, D
    [J]. MODERN LANGUAGE REVIEW, 1998, 93 : 763 - 764
  • [8] LANGUAGE MODEL ADAPTATION USING RANDOM FORESTS
    Deoras, Anoop
    Jelinek, Frederick
    Su, Yi
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5198 - 5201
  • [9] USING CORPORA AS A RESOURCE IN LANGUAGE TEACHING
    WILSON, E
    [J]. COMPUTERS & EDUCATION, 1994, 23 (1-2) : 41 - 51
  • [10] Using corpora for language research.
    MacWhinney, B
    [J]. APPLIED PSYCHOLINGUISTICS, 1998, 19 (04) : 691 - 692