IMPROVEMENTS TO N-GRAM LANGUAGE MODEL USING TEXT GENERATED FROM NEURAL LANGUAGE MODEL

被引:0
|
作者
Suzuki, Masayuki [1 ]
Itoh, Nobuyasu [1 ]
Nagano, Tohru [1 ]
Kurata, Gakuto [1 ]
Thomas, Samuel [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
n-gram; RNNLM; interpolation; subword; template;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Although neural language models have emerged, n-gram language models are still used for many speech recognition tasks. This paper proposes four methods to improve n-gram language models using text generated from a recurrent neural network language model (RNNLM). First, we use multiple RNNLMs from different domains instead of a single RNNLM. The final n-gram language model is obtained by interpolating generated n-gram models from each domain. Second, we use subwords instead of words for RNNLM to reduce the out-of-vocabulary rate. Third, we generate text templates using an RNNLM for template-based data augmentation for named entities. Fourth, we use both forward RNNLM and backward RNNLM to generate text. We found that these four methods improved performance of speech recognition up to 4% relative in various tasks.
引用
收藏
页码:7245 / 7249
页数:5
相关论文
共 50 条
  • [1] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [2] A New Estimate of the n-gram Language Model
    Aouragh, Si Lhoussain
    Yousfi, Abdellah
    Laaroussi, Saida
    Gueddah, Hicham
    Nejja, Mohammed
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
  • [3] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    [J]. 2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [4] Fast Neural Network Language Model Lookups at N-Gram Speeds
    Huang, Yinghui
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 274 - 278
  • [5] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [6] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +
  • [7] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
    Fohr, Dominique
    Mella, Odile
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235
  • [8] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
  • [9] Flick: Japanese Input Method Editor using N-gram and Recurrent Neural Network Language Model based Predictive Text Input
    Ikegami, Yukino
    Sakurai, Yoshitaka
    Damiani, Ernesto
    Knauf, Rainer
    Tsuruta, Setsuo
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 50 - 55
  • [10] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,