A Novel Interpolated N-gram Language Model Based on Class Hierarchy

被引:0
|
作者
Lv, Zhenyu [1 ]
Liu, Wenju [1 ]
Yang, Zhanlei [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Language model; class hierarchy; cluster; interpolate; back-off;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes' and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter's corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.
引用
收藏
页码:473 / 477
页数:5
相关论文
共 50 条
  • [21] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [22] A variable-length category-based n-gram language model
    Niesler, TR
    Woodland, PC
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
  • [23] Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables
    Allam, Tahani Mahmoud
    Sallam, Alsayed Abdelhameed
    Abdullkader, Hatem M.
    2014 9TH INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS), 2014,
  • [24] On compressing n-gram language models
    Hirsimaki, Teemu
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [25] Bayesian estimation methods for N-gram language model adaptation
    Federico, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 240 - 243
  • [26] Discriminative n-gram language modeling
    Roark, Brian
    Saraclar, Murat
    Collins, Michael
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
  • [27] Croatian Language N-Gram System
    Dembitz, Sandor
    Blaskovic, Bruno
    Gledec, Gordan
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 696 - 705
  • [28] DOCUMENT-BASED DIRICHLET CLASS LANGUAGE MODEL FOR SPEECH RECOGNITION USING DOCUMENT-BASED N-GRAM EVENTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 42 - 47
  • [29] A WEIGHTED AVERAGE N-GRAM MODEL OF NATURAL-LANGUAGE
    OBOYLE, P
    OWENS, M
    SMITH, FJ
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04): : 337 - 349
  • [30] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860