Explaining the success of adaboost and random forests as interpolating classifiers

被引:0
|
作者
Wyner, Abraham J. [1 ]
Olson, Matthew [1 ]
Bleich, Justin [1 ]
Mease, David [2 ]
机构
[1] Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia,PA,19104, United States
[2] Apple Inc., United States
关键词
D O I
暂无
中图分类号
学科分类号
摘要
There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a selfaveraging, interpolating algorithm which creates what we denote as a spiked-smooth classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees, without regularization or early stopping. © 2017 Abraham J. Wyner, Matthew Olson, Justin Bleich, and David Mease.
引用
收藏
页码:1 / 33
相关论文
共 50 条
  • [31] Corporate Default Prediction with AdaBoost and Bagging Classifiers
    Ramakrishnan, Suresh
    Mirzaei, Maryam
    Bekri, Mahmoud
    JURNAL TEKNOLOGI, 2015, 73 (02):
  • [32] Improved algorithm for AdaBoost with SVM base classifiers
    Wang, Xiaodan
    Wu, Chongming
    Meng, Chunying
    Wang, Wei
    PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, : 948 - 952
  • [33] Weak Classifiers Selecting based on PSO in AdaBoost
    Li, Rui
    Zhang, Jiurui
    Mao, Li
    2011 INTERNATIONAL CONFERENCE ON FUTURE SOFTWARE ENGINEERING AND MULTIMEDIA ENGINEERING (FSME 2011), 2011, 7 : 6 - 12
  • [34] Multiclass Adaboost and Coupled Classifiers for Object Detection
    Verschae, Rodrigo
    Ruiz-del-Solar, Javier
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 560 - 567
  • [35] Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets
    Tang, Jiayi
    Henderson, Alex
    Gardner, Peter
    ANALYST, 2021, 146 (19) : 5880 - 5891
  • [36] Predicting University Students' Academic Success and Major Using Random Forests
    Beaulac, Cedric
    Rosenthal, Jeffrey S.
    RESEARCH IN HIGHER EDUCATION, 2019, 60 (07) : 1048 - 1064
  • [37] Interpolating missing land cover data using stochastic spatial random forests for improved change detection
    Holloway-Brown, Jacinta
    Helmstedt, Kate J.
    Mengersen, Kerrie L.
    REMOTE SENSING IN ECOLOGY AND CONSERVATION, 2021, 7 (04) : 649 - 665
  • [38] Predicting University Students’ Academic Success and Major Using Random Forests
    Cédric Beaulac
    Jeffrey S. Rosenthal
    Research in Higher Education, 2019, 60 : 1048 - 1064
  • [39] From Random Forests to Flood Forecasts A Research to Operations Success Story
    Schumacher, Russ S.
    Hill, Aaron J.
    Klein, Mark
    Nelson, James A.
    Erickson, Michael J.
    Trojniak, Sarah M.
    Herman, Gregory R.
    BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 2021, 102 (09) : E1742 - E1755
  • [40] Binary classifiers versus AdaBoost for labeling of digital documents
    Montejo-Raez, Arturo
    Alfonso Urena-Lopez, Luis
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2006, (37): : 319 - 326