Distribution-free bounds for relational classification

被引:6
|
作者
Dhurandhar, Amit [1 ]
Dobra, Alin [2 ]
机构
[1] IBM TJ Watson, Dept Math Sci, Yorktown Hts, NY USA
[2] Univ Florida, Gainesville, FL USA
基金
美国国家科学基金会;
关键词
Data mining; Relational learning; Bounds; Classification; PROBABILITY-INEQUALITIES; LEARNABILITY; SUM;
D O I
10.1007/s10115-011-0406-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical relational learning (SRL) is a subarea in machine learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.)-as is generally assumed. For the traditional i.i.d. setting, distribution-free bounds exist, such as the Hoeffding bound, which are used to provide confidence bounds on the generalization error of a classification algorithm given its hold-out error on a sample size of N. Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classification algorithms. In this paper, we extend the Hoeffding bounds to the relational setting. In particular, we derive distribution-free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.
引用
收藏
页码:55 / 78
页数:24
相关论文
共 50 条