in certain problem domains, "The Curse of Dimensionality" (Hastiect al., 2001) is well known. Also known as the problem of "High P and Low N" where the number of parameters far exceeds the number of samples to learn from, we describe our methods for making the most of limited samples in producing reasonably general classification rules from data with a larger number of parameters. We discuss the application of this approach in classifying mesothelioma samples from baseline data according to their time to recurrence. In this case there are 12,625 inputs for each sample but only 19 samples to learn from. We reflect on the theoretical implications of the behavior of GP in these extreme cases and speculate on the nature of generality.
机构:
Univ Marburg, Fac Math & Comp Sci, Workgrp Numer & Optimizat, D-35032 Marburg, GermanyUniv Marburg, Fac Math & Comp Sci, Workgrp Numer & Optimizat, D-35032 Marburg, Germany