Convexity, classification, and risk bounds

被引：734

作者：

Bartlett, PL ^{[1
]}

Jordan, MI

McAuliffe, JD

机构：

[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[2] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2006年 / 101卷 / 473期

基金：

美国国家科学基金会;

关键词：

boosting; convex optimization; empirical process theory; machine learning; rademacher complexity; support vector machine;

D O I：

10.1198/016214505000000907

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function-that it satisfies a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. and show that in this case, strictly convex loss functions lead to faster rates of convergence of the risk than would be implied by standard uniform convergence arguments. Finally, we present applications of our results to the estimation of convergence rates in function classes that are scaled convex hulls of a finite-dimensional base class, with a variety of commonly used loss functions.

引用

页码：138 / 156

页数：19

共 50 条

[1] On Convexity and Bounds of Fairness-aware Classification
Wu, Yongkai
Zhang, Lu
Wu, Xintao
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3356 - 3362
[2] Polynomial convexity with degree bounds
Slapar, Marko
COMPLEX VARIABLES AND ELLIPTIC EQUATIONS, 2024,
[3] BOUNDS FOR EIGENVALUES AND GENERALIZED CONVEXITY
BANKS, DO
PACIFIC JOURNAL OF MATHEMATICS, 1963, 13 (04) : 1031 - &
[4] Convexity bounds for L-functions
Heath-Brown, D. R.
ACTA ARITHMETICA, 2009, 136 (04) : 391 - 395
[5] Impressions of convexity An illustration for commutator bounds
Wenzel, David
Audenaert, Koenraad M. R.
LINEAR ALGEBRA AND ITS APPLICATIONS, 2010, 433 (11-12) : 1726 - 1759
[6] Risk Bounds for Embedded Variable Selection in Classification Trees
Gey, Servane
Mary-Huard, Tristan
IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (03) : 1688 - 1699
[7] ON EXPONENTIAL BOUNDS ON THE BAYES RISK OF THE KERNEL CLASSIFICATION RULE
KRZYZAK, A
IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (03) : 490 - 499
[8] Convexity, Schur-convexity and bounds for the gamma function involving the digamma function
Merkle, M
ROCKY MOUNTAIN JOURNAL OF MATHEMATICS, 1998, 28 (03) : 1053 - 1066
[9] MAHLER MEASURE FOR POLYNOMIALS AND BOUNDS FROM CONVEXITY
LELONG, P
COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 1992, 315 (02): : 139 - 142
[10] BOUNDS ON THE BAYES CLASSIFICATION ERROR BASED ON PAIRWISE RISK FUNCTIONS
GARBER, FD
DJOUADI, A
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (02) : 281 - 288

← 1 2 3 4 5 →