Multi-class composite N-gram based on connection direction

被引:12
|
作者
Yamamoto, H [1 ]
Sagisaka, Y [1 ]
机构
[1] ATR Interpreting Telecommun Res Labs, Seika, Kyoto 6190288, Japan
关键词
D O I
10.1109/ICASSP.1999.758180
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word. In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to Multi-Class Composite N-gram that unit is Multi-Grass 2-gram and joined word. Multi-Class Composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.
引用
收藏
页码:533 / 536
页数:4
相关论文
共 50 条
  • [1] Multi-class composite N-gram language model
    Yamamoto, H
    Isogai, S
    Sagisaka, Y
    [J]. SPEECH COMMUNICATION, 2003, 41 (2-3) : 369 - 379
  • [2] Multiclass composite N-gram language model based on connection direction
    Yamamoto, Hirofumi
    Sagisaka, Yoshinori
    [J]. Systems and Computers in Japan, 2003, 34 (07) : 108 - 114
  • [3] Multi-class composite N-gram language model for spoken language processing using multiple word clusters
    Yamamoto, H
    Isogai, S
    Sagisaka, Y
    [J]. 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 531 - 538
  • [4] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [5] Turkish Meaningful Text Generation with Class Based N-Gram Model
    Kutlugun, Mehmet Ali
    Sirin, Yahya
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [6] A Novel Interpolated N-gram Language Model Based on Class Hierarchy
    Lv, Zhenyu
    Liu, Wenju
    Yang, Zhanlei
    [J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 473 - 477
  • [7] Connection Preemption in Multi-Class Networks
    Dogar, Fahad Rafique
    Aslam, Laeeq
    Uzmi, Zartash Afzal
    Abbasi, Sarmad
    Kim, Young-Chon
    [J]. GLOBECOM 2006 - 2006 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, 2006,
  • [8] Multilingual stochastic n-gram class language models
    Jardino, M
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 161 - 163
  • [9] N-gram and N-class models for on line handwriting recognition
    Perraud, F
    Viard-Gaudin, C
    Morin, E
    Lallican, PM
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 1053 - 1057
  • [10] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    [J]. AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +