Tree Induction for Probability-Based Ranking

被引：1

作者：

Foster Provost

Pedro Domingos

机构：

[1] New York University,

[2] University of Washington,undefined

来源：

Machine Learning | 2003年 / 52卷

关键词：

ranking; probability estimation; classification; cost-sensitive learning; decision trees; Laplace correction; bagging;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability-based rankings, and by how much. In this paper we first discuss why the decision-tree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decision-tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example via reduced-error pruning). Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization. We then present the results of a comprehensive set of experiments, testing some straightforward methods for improving probability-based rankings. We show that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings. In addition, bagging substantially improves the rankings, and is even more effective for this purpose than for improving accuracy. We conclude that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required.

引用

页码：199 / 215

页数：16

共 50 条

[31] Probability-Based Semantic Interpretation of Mutants
Patrick, Matthew
Alexander, Rob
Oriol, Manuel
Clark, John A.
2014 SEVENTH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2014), 2014, : 186 - 195
[32] Probability-based bridge inspection strategy
Sommer, Anne Mette
Nowak, Andrzej S.
Thoft-Christensen, Palle
Journal of structural engineering New York, N.Y., 1993, 119 (12): : 3520 - 3536
[33] Probability-based location anonymity algorithm
Yan, Yushuang
Tan, Shichong
Zhao, Dawei
Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2015, 42 (06): : 75 - 80
[34] PROBABILITY-BASED CONTROL FOR COMPUTER VISION
LEVITT, TS
BINFORD, TO
ETTINGER, GJ
GELBAND, P
IMAGE UNDERSTANDING WORKSHOP /, 1989, : 355 - 369
[35] PROBABILITY-BASED CRITERIA FOR STRUCTURAL DESIGN
ELLINGWOOD, B
GALAMBOS, TV
STRUCTURAL SAFETY, 1982, 1 (01) : 15 - 26
[36] PROBABILITY-BASED OPTIMIZATION OF TRUSS STRUCTURES
JOZWIAK, SF
COMPUTERS & STRUCTURES, 1989, 32 (01) : 87 - 91
[37] A PROBABILITY-BASED DIAGNOSTIC ALGORITHM FOR SUSPECTED GCA
Sebastian, A.
Kayani, A.
Ranasinghe, C.
Dasgupta, B.
ANNALS OF THE RHEUMATIC DISEASES, 2020, 79 : 1063 - 1064
[38] The probability-based granular field of vegetated soils
Zhang, Jun
Li, Yong
Liu, Daochuan
Jiang, Ning
Yang, Taiqiang
Guo, Xiaojun
Yao, Yingjie
EARTH SURFACE PROCESSES AND LANDFORMS, 2022, 47 (13) : 3100 - 3116
[39] HyperGo: Probability-based directed hybrid fuzzing
Lin, Peihong
Wang, Pengfei
Zhou, Xu
Xie, Wei
Lu, Kai
Zhang, Gen
COMPUTERS & SECURITY, 2024, 142
[40] Probability-based methods for quantifying nonlinearity in the ENSO
A. Hannachi
D. B. Stephenson
K. R. Sperber
Climate Dynamics, 2004, 22 : 69 - 70

← 1 2 3 4 5 →