A combinatorial approach is developed that leads to tight bounds for the probability of overfitting in a number of special cases. The Vapnik Chervonenkis classical bound is easy to restate under the weak probabillity assumptions where △ is the diversity coefficient of A, which is equal to the number of different error vectors generated by all possible algorithms a from A. An experimental analysis of major causes of overestimated bound shows that the probability of over fitting depends substantially not only on the number of different error vectors but also on the degree of their difference. The set A may contain a large number of pairs of similar algorithms. Specifically, most classification algorithms used in practice have a separating surface that is continuous with respect to the parameters.
机构:
Univ Calif Davis, Dept Comp Sci, One Shields Ave, Davis, CA 95616 USA
Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USAUniv Calif Davis, Dept Comp Sci, One Shields Ave, Davis, CA 95616 USA
Bailey, David H.
Borwein, Jonathan M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Newcastle, Callaghan, NSW 2308, AustraliaUniv Calif Davis, Dept Comp Sci, One Shields Ave, Davis, CA 95616 USA
Borwein, Jonathan M.
de Prado, Marcos Lopez
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, 1 Cyclotron Rd, Berkeley, CA 94720 USA
Guggenheim Partners, 330 Madison Ave, New York, NY 10017 USAUniv Calif Davis, Dept Comp Sci, One Shields Ave, Davis, CA 95616 USA
de Prado, Marcos Lopez
Zhu, Qiji Jim
论文数: 0引用数: 0
h-index: 0
机构:
Western Michigan Univ, Dept Math, Kalamazoo, MI 49008 USAUniv Calif Davis, Dept Comp Sci, One Shields Ave, Davis, CA 95616 USA