Towards Competitive N-gram Smoothing

被引:0
|
作者
Falahatgar, Moein [1 ]
Ohannessian, Mesrob [2 ]
Orlitsky, Alon [1 ]
Pichapati, Venkatadheeraj [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Univ Illinois, Chicago, IL USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
N-gram models remain a fundamental component of language modeling. In data-scarce regimes, they are a strong alternative to neural models. Even when not used as-is, recent work shows they can regularize neural models. Despite this success, the effectiveness of one of the best N-gram smoothing methods, the one suggested by Kneser and Ney (1995), is not fully understood. In the hopes of explaining this performance, we study it through the lens of competitive distribution estimation: the ability to perform as well as an oracle aware of further structure in the data. We first establish basic competitive properties of Kneser-Ney smoothing. We then investigate the nature of its backoff mechanism and show that it emerges from first principles, rather than being an assumption of the model. We do this by generalizing the Good-Turing estimator to the contextual setting. This exploration leads us to a powerful generalization of Kneser-Ney, which we conjecture to have even stronger competitive properties. Empirically, it significantly improves performance on language modeling, even matching feed-forward neural models. To show that the mechanisms at play are not restricted to language modeling, we demonstrate similar gains on the task of predicting attack types in the Global Terrorism Database.
引用
收藏
页码:4206 / 4214
页数:9
相关论文
共 50 条
  • [21] Semantic N-Gram Topic Modeling
    Kherwa, Pooja
    Bansal, Poonam
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (26) : 1 - 12
  • [22] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [23] On compressing n-gram language models
    Hirsimaki, Teemu
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [24] N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
    Leary, Martin
    Pearson, Geoff
    Burvill, Colin
    Mazur, Maciej
    Subic, Aleksandar
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2011, 6 : 414 - 423
  • [25] Supervised N-gram Topic Model
    Kawamae, Noriaki
    WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 473 - 482
  • [26] Discriminative n-gram language modeling
    Roark, Brian
    Saraclar, Murat
    Collins, Michael
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
  • [27] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [28] Croatian Language N-Gram System
    Dembitz, Sandor
    Blaskovic, Bruno
    Gledec, Gordan
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 696 - 705
  • [29] A BIN-BASED ONTOLOGICAL FRAMEWORK FOR LOW-RESOURCE N-GRAM SMOOTHING IN LANGUAGE MODELLING
    Benahmed, Y.
    Selouani, S. -A.
    O'Shaughnessy, D.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [30] N-Gram Pattern Recognition using Multivariate-Bernoulli Model with Smoothing Methods for Text Classification
    Kilimci, Zeynep Hilal
    Akyokus, Selim
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 597 - 600