Ensemble of optimal trees, random forest and random projection ensemble classification

被引:48
|
作者
Khan, Zardad [1 ,2 ]
Gul, Asma [2 ,3 ]
Perperoglou, Aris [2 ]
Miftahuddin, Miftahuddin [2 ,4 ]
Mahmoud, Osama [2 ,5 ,6 ]
Adler, Werner [7 ]
Lausen, Berthold [2 ]
机构
[1] Abdul Wali Khan Univ, Dept Stat, Mardan, Pakistan
[2] Univ Essex, Dept Math Sci, Colchester CO4 3SQ, Essex, England
[3] Shaheed Benazir Bhutto Women Univ, Dept Stat, Peshawar, Pakistan
[4] Syiah Kuala Univ, Coll Sci, Banda Aceh, Indonesia
[5] Helwan Univ, Dept Appl Stat, Cairo, Egypt
[6] Univ Bristol, Sch Social & Community Med, Bristol BS8 2BN, Avon, England
[7] Univ Erlangen Nurnberg, Dept Biometry & Epidemiol, Erlangen, Germany
关键词
Ensemble classification; Ensemble regression; Random forest; Random projection ensemble classification; Accuracy and diversity; CLASSIFIERS; ALGORITHMS;
D O I
10.1007/s11634-019-00364-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree. We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures.
引用
收藏
页码:97 / 116
页数:20
相关论文
共 50 条
  • [1] Ensemble of optimal trees, random forest and random projection ensemble classification
    Zardad Khan
    Asma Gul
    Aris Perperoglou
    Miftahuddin Miftahuddin
    Osama Mahmoud
    Werner Adler
    Berthold Lausen
    Advances in Data Analysis and Classification, 2020, 14 : 97 - 116
  • [2] Random-projection ensemble classification
    Cannings, Timothy I.
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (04) : 959 - 1035
  • [3] Random projection ensemble adaptive nearest neighbor classification
    Kang, Jongkyeong
    Jhun, Myoungshic
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 401 - 410
  • [4] Random Projection Ensemble Classifiers
    Schclar, Alon
    Rokach, Lior
    ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 309 - +
  • [5] Random forest ensemble classification based fuzzy logic
    Ben Ayed, Abdelkarim
    Benhammouda, Marwa
    Ben Halima, Mohamed
    Alimi, Adel M.
    NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [6] An Ensemble System with Random Projection and Dynamic Ensemble Selection
    Manh Truong Dang
    Anh Vu Luong
    Tuyet-Trinh Vu
    Quoc Viet Hung Nguyen
    Tien Thanh Nguyen
    Stantic, Bela
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 576 - 586
  • [7] A New Random Forest Ensemble of Intuitionistic Fuzzy Decision Trees
    Ren, Yingtao
    Zhu, Xiaomin
    Bai, Kaiyuan
    Zhang, Runtong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (05) : 1729 - 1741
  • [8] Secondary triage classification using an ensemble random forest technique
    Azeez, Dhifaf
    Gan, K. B.
    Ali, M. A. Mohd
    Ismail, M. S.
    TECHNOLOGY AND HEALTH CARE, 2015, 23 (04) : 419 - 428
  • [9] Cell image classification based on ensemble features and random forest
    Ko, B. C.
    Gim, J. W.
    Nam, J. Y.
    ELECTRONICS LETTERS, 2011, 47 (11) : 638 - U72
  • [10] Random projection ensemble conformal prediction for high-dimensional classification
    Qian, Xiaoyu
    Wu, Jinru
    Wei, Ligong
    Lin, Youwu
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 253