Constructing n-gram rules for natural language models through exploring the limitation of the Zipf–Mandelbrot law

被引：0

作者：

Harry M. Chang

机构：

[1] AT&T Labs-Research,

来源：

Computing | 2011年 / 91卷

关键词：

Zipf–Mandelbrot law; Natural language processing; -gram statistical language models; Quantitative linguistics; 60; 62; 94;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The Zipf–Mandelbrot law is widely used to model a power-law distribution on ranked data. One of the best known applications of the Zipf–Mandelbrot law is in the area of linguistic analysis of the distribution of words ranked by their frequency in a text corpus. By exploring known limitations of the Zipf–Mandelbrot law in modeling the actual linguistic data from different domains in both printed media and online content, a new algorithm is developed to effectively construct n-gram rules for building natural language (NL) models required for a human-to-computer interface. The construction of statistically-oriented n-gram rules is based on a new computing algorithm that identifies the area of divergence between Zipf–Mandelbrot curve and the actual frequency distribution of the ranked n-gram text tokens extracted from a large text corpus derived from the online electronic programming guide (EPG) for television shows and movies. Two empirical experiments were carried out to evaluate the EPG-specific language models created using the new algorithm in the context of NL-based information retrieval systems. The experimental results show the effectiveness of the algorithm for developing low-complexity concept models with high coverage for the user’s language models associated with both typed and spoken queries when interacting with a NL-based EPG search interface.

引用

页码：241 / 264

页数：23

共 50 条

[1] Constructing n-gram rules for natural language models through exploring the limitation of the Zipf-Mandelbrot law
Chang, Harry M.
[J]. COMPUTING, 2011, 91 (03) : 241 - 264
[2] POWER LAW DISCOUNTING FOR N-GRAM LANGUAGE MODELS
Huang, Songfang
Renals, Steve
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5178 - 5181
[3] On compressing n-gram language models
Hirsimaki, Teemu
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
[4] SPANISH LINGUISTIC STEGANOGRAPHY BASED ON N-GRAM MODEL AND ZIPF LAW
Munoz Munoz, Alfonso
Argueelles Alvarez, Irina
[J]. ARBOR-CIENCIA PENSAMIENTO Y CULTURA, 2014, 190 (768)
[5] Perplexity of n-Gram and Dependency Language Models
Popel, Martin
Marecek, David
[J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
[6] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
Sak, Hasim
Allauzen, Cyril
Nakajima, Kaisuke
Beaufays, Francoise
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
[7] Use of statistical N-gram models in natural language generation for machine translation
Liu, FH
Gu, L
Gao, YQ
Picheny, M
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 636 - 639
[8] Profile based compression of n-gram language models
Olsen, Jesper
Oria, Daniela
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1041 - 1044
[9] Improved N-gram Phonotactic Models For Language Recognition
BenZeghiba, Mohamed Faouzi
Gauvain, Jean-Luc
Lamel, Lori
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2718 - 2721
[10] N-gram language models for massively parallel devices
Bogoychev, Nikolay
Lopez, Adam
[J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1944 - 1953

← 1 2 3 4 5 →