Lifting the curse of dimensionality

被引:1
|
作者
Worzel, W. P. [1 ]
Almal, A. [1 ]
MacLean, C. D. [1 ]
机构
[1] Genet Squared Inc, Ann Arbor, MI USA
基金
澳大利亚研究理事会;
关键词
microarray; mesotheliorna; cancer; curse of dimensionality; classifier; genetic programming; correlation analysis; diagnostic rule; ensemble; N-fold cross-validation;
D O I
10.1007/978-0-387-49650-4_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
in certain problem domains, "The Curse of Dimensionality" (Hastiect al., 2001) is well known. Also known as the problem of "High P and Low N" where the number of parameters far exceeds the number of samples to learn from, we describe our methods for making the most of limited samples in producing reasonably general classification rules from data with a larger number of parameters. We discuss the application of this approach in classifying mesothelioma samples from baseline data according to their time to recurrence. In this case there are 12,625 inputs for each sample but only 19 samples to learn from. We reflect on the theoretical implications of the behavior of GP in these extreme cases and speculate on the nature of generality.
引用
收藏
页码:29 / +
页数:3
相关论文
共 50 条