Profile based compression of n-gram language models

被引:0
|
作者
Olsen, Jesper [1 ]
Oria, Daniela [1 ]
机构
[1] Nokia Res Ctr, Multimedia Technol Lab, Helsinki, Finland
关键词
n-gram language models; dictation; embedded systems;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A profile based technique for compression of n-gram language models is presented. The technique is intended to be used in combination with existing techniques for size reduction of n-gram language models such as pruning, quantisation and word class modelling. The technique is here evaluated on a large vocabulary embedded dictation task. When used in combination with quantisation, the technique can reduce the memory needed for storing probabilities by a factor of 10 or more with only a small degradation in word accuracy. The structure of the language model is well suited for "best-first" type decoding styles, and is here used for guiding an isolated word recogniser by predicting likely continuations at word boundaries.
引用
收藏
页码:1041 / 1044
页数:4
相关论文
共 50 条
  • [1] On compressing n-gram language models
    Hirsimaki, Teemu
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [2] Rich Morphology Based N-gram Language Models for Arabic
    Emami, Ahmad
    Zitouni, Imed
    Mangu, Lidia
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 829 - 832
  • [3] Perplexity of n-Gram and Dependency Language Models
    Popel, Martin
    Marecek, David
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
  • [4] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [5] Improved N-gram Phonotactic Models For Language Recognition
    BenZeghiba, Mohamed Faouzi
    Gauvain, Jean-Luc
    Lamel, Lori
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721
  • [6] Efficient MDI Adaptation for n-gram Language Models
    Huang, Ruizhe
    Li, Ke
    Arora, Ashish
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2020, 2020, : 4916 - 4920
  • [7] N-gram language models for massively parallel devices
    Bogoychev, Nikolay
    Lopez, Adam
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1944 - 1953
  • [8] Constrained Discriminative Training of N-gram Language Models
    Rastrow, Ariya
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 311 - +
  • [9] POWER LAW DISCOUNTING FOR N-GRAM LANGUAGE MODELS
    Huang, Songfang
    Renals, Steve
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5178 - 5181
  • [10] Multilingual stochastic n-gram class language models
    Jardino, M
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 161 - 163