In this paper we deal with the problem of mining large distributed databases. We show that the aggregation of models, i.e., sets of disjoint classification rules, each built over a subdatabase is quite enough to get an aggregated model that is both predictive and descriptive, that presents excellent prediction capability and that is conceptually much simpler than the comparable techniques. These results are made possible by lifting the disjoint cover constraint on the aggregated model and by the use of a confidence coefficient associated with each rule in a weighted majority vote.
机构:
Univ Utah, Sch Med, Div Cardiovasc Med, New York, NY USAUniv Utah, Sch Med, Div Cardiovasc Med, New York, NY USA
Shah, Rashmee U.
Bress, Adam P.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Utah, Sch Med, Dept Populat Hlth Sci, New York, NY USAUniv Utah, Sch Med, Div Cardiovasc Med, New York, NY USA
Bress, Adam P.
Vickers, Andrew J.
论文数: 0引用数: 0
h-index: 0
机构:
Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, 1275 York Ave, New York, NY 10021 USAUniv Utah, Sch Med, Div Cardiovasc Med, New York, NY USA