A Novel Interpolated N-gram Language Model Based on Class Hierarchy

被引:0
|
作者
Lv, Zhenyu [1 ]
Liu, Wenju [1 ]
Yang, Zhanlei [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Language model; class hierarchy; cluster; interpolate; back-off;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree is used to balance the generalization ability of classes' and word specificity when estimating the likelihood of a n-gram event. Experiments are performed on Reuter's corpus using a vocabulary of 27,000 words. Results show a reduction on the test perplexity over the standard Modified KN n-gram approach by 12%.
引用
收藏
页码:473 / 477
页数:5
相关论文
共 50 条
  • [1] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [2] Hierarchical linear discounting class n-gram language models: A multilevel class hierarchy approach
    Zitouni, Imed
    Zhou, Qiru
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4917 - +
  • [3] Multi-class composite N-gram language model
    Yamamoto, H
    Isogai, S
    Sagisaka, Y
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 369 - 379
  • [4] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [5] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [6] Multilingual stochastic n-gram class language models
    Jardino, M
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 161 - 163
  • [7] A New Estimate of the n-gram Language Model
    Aouragh, Si Lhoussain
    Yousfi, Abdellah
    Laaroussi, Saida
    Gueddah, Hicham
    Nejja, Mohammed
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
  • [8] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [9] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
  • [10] Multiclass composite N-gram language model based on connection direction
    Yamamoto, Hirofumi
    Sagisaka, Yoshinori
    Systems and Computers in Japan, 2003, 34 (07) : 108 - 114