Tree Induction for Probability-Based Ranking

被引:1
|
作者
Foster Provost
Pedro Domingos
机构
[1] New York University,
[2] University of Washington,undefined
来源
Machine Learning | 2003年 / 52卷
关键词
ranking; probability estimation; classification; cost-sensitive learning; decision trees; Laplace correction; bagging;
D O I
暂无
中图分类号
学科分类号
摘要
Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability-based rankings, and by how much. In this paper we first discuss why the decision-tree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decision-tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example via reduced-error pruning). Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization. We then present the results of a comprehensive set of experiments, testing some straightforward methods for improving probability-based rankings. We show that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings. In addition, bagging substantially improves the rankings, and is even more effective for this purpose than for improving accuracy. We conclude that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required.
引用
收藏
页码:199 / 215
页数:16
相关论文
共 50 条
  • [21] Dandelion Algorithm With Probability-Based Mutation
    Zhu, Honghao
    Liu, Guanjun
    Zhou, Mengchu
    Xie, Yu
    Kang, Qi
    IEEE ACCESS, 2019, 7 : 97974 - 97985
  • [22] Probability-based prediction query algorithm
    Yan, Yushuang
    Pei, Qingqi
    Wang, Xiang
    Wang, Yong
    AD HOC NETWORKS, 2017, 60 : 52 - 65
  • [23] Probability-Based Process Capability Indices
    Khadse, K. G.
    Shinde, R. L.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (04) : 884 - 904
  • [24] Probability-based Location Prediction Algorithm
    Yan, Yushuang
    Pei, Qingqi
    Wang, Xiang
    Wang, Yong
    2017 IEEE 86TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL), 2017,
  • [25] Probability-based design of steel structures
    Marek, Pavel
    Gustar, Milan
    Stahlbau, 1999, 68 (01): : 62 - 69
  • [26] Probability-based design standards of structures
    Mrazik, A
    Krizma, M
    STRUCTURAL SAFETY, 1997, 19 (02) : 219 - 234
  • [27] Probability-based indicator of ecological condition
    Howe, Robert W.
    Regal, Ronald R.
    Niemi, Gerald J.
    Danz, Nicholas P.
    Hanowski, Joann M.
    ECOLOGICAL INDICATORS, 2007, 7 (04) : 793 - 806
  • [28] Probability-based seismic response analysis
    Aslani, H
    Miranda, E
    ENGINEERING STRUCTURES, 2005, 27 (08) : 1151 - 1163
  • [29] PROBABILITY-BASED DESIGN CODES.
    Corotis, Ross B.
    1985, (07)
  • [30] Probability-Based Rendering for View Synthesis
    Ham, Bumsub
    Min, Dongbo
    Oh, Changjae
    Do, Minh N.
    Sohn, Kwanghoon
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (02) : 870 - 884