Distribution-free bounds for relational classification

被引:0
|
作者
Amit Dhurandhar
Alin Dobra
机构
[1] IBM T. J. Watson,
[2] University of Florida,undefined
来源
关键词
Data mining; Relational learning; Bounds; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Statistical relational learning (SRL) is a subarea in machine learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.)—as is generally assumed. For the traditional i.i.d. setting, distribution-free bounds exist, such as the Hoeffding bound, which are used to provide confidence bounds on the generalization error of a classification algorithm given its hold-out error on a sample size of N. Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classification algorithms. In this paper, we extend the Hoeffding bounds to the relational setting. In particular, we derive distribution-free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.
引用
收藏
页码:55 / 78
页数:23
相关论文
共 50 条