Upper and Lower Tight Error Bounds for Feature Omission with an Extension to Context Reduction

被引:0
|
作者
Schlueter, Ralf [1 ]
Beck, Eugen [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, Ahornstr 55, D-52056 Aachen, Germany
基金
欧洲研究理事会;
关键词
Error bound; Bayes error; feature selection; language model; perplexity; context reduction; pattern classification; sequence classification; LANGUAGE; RECOGNITION;
D O I
10.1109/TPAMI.2017.2788434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, fundamental analytic results in the form of error bounds are presented that quantify the effect of feature omission and selection for pattern classification in general, as well as the effect of context reduction in string classification, like automatic speech recognition, printed/handwritten character recognition, or statistical machine translation. A general simulation framework is introduced that supports discovery and proof of error bounds, which lead to the error bounds presented here. Initially derived tight lower and upper bounds for feature omission are generalized to feature selection, followed by another extension to context reduction of string class priors (aka language models) in string classification. For string classification, the quantitative effect of string class prior context reduction on symbol-level Bayes error is presented. The tightness of the original feature omission bounds seems lost in this case, as further simulations indicate. However, combining both feature omission andcontext reduction, the tightness of the bounds is retained. A central result of this work is the proof of the existence, and the amount of a statistical threshold w.r.t. the introduction of additional features in general pattern classification, or the increase of context in string classification beyond which a decrease in Bayes error is guaranteed.
引用
收藏
页码:502 / 514
页数:13
相关论文
共 50 条