Language modeling with probabilistic left corner parsing

被引:5
|
作者
Van Uytsel, DH [1 ]
Van Compernolle, D [1 ]
机构
[1] Katholieke Univ Leuven, ESAT, B-3001 Heverlee, Belgium
来源
COMPUTER SPEECH AND LANGUAGE | 2005年 / 19卷 / 02期
关键词
D O I
10.1016/j.csl.2004.05.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel language model, suitable for large-vocabulary continuous speech recognition, based on parsing with a probabilistic left corner grammar (PLCG). The PLCG probabilities are conditioned on local and non-local features of the partial parse tree, and some of these features are lexical. They are not derived from another stochastic grammar, but directly induced from a treebank, a corpus of text sentences, annotated with parse trees. A context-enriched constituent represents all partial parse trees that are equivalent with respect to the probability of the next parse move. For computational efficiency the parsing problem is represented as a traversal through a compact stochastic network of constituents connected by PLCG moves. The efficiency of the algorithm is due to the fact that the network consists of recursively nested, shared subnetworks. The PLCG-based language model results from accumulating the probabilities of all (partial) paths through this network. Next word probabilities can be computed synchronously with the probabilistic left corner parsing algorithm in one single pass from left to right. They are guaranteed to be normalized, even when pruning less likely paths. Finally, it is shown experimentally that the PLCG-based language model is a competitive alternative to other syntax-based language models, both in efficiency and accuracy. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:171 / 204
页数:34