Compressing Random Forests

被引:0
|
作者
Painsky, Amichai [1 ]
Rosset, Saharon [1 ]
机构
[1] Tel Aviv Univ, Dept Stat, IL-69978 Tel Aviv, Israel
基金
以色列科学基金会;
关键词
Random Forest; Compression; Entropy Coding;
D O I
10.1109/ICDM.2016.72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble methods are considered among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on Random Forests. Our suggested method is based on probabilistic modeling of the ensemble's trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble.
引用
收藏
页码:1131 / 1136
页数:6
相关论文
共 50 条
  • [1] Imprecise Extensions of Random Forests and Random Survival Forests
    Utkin, Lev, V
    Kovalev, Maxim S.
    Meldo, Anna A.
    Coolen, Frank P. A.
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITIES: THEORIES AND APPLICATIONS (ISIPTA 2019), 2019, 103 : 404 - 413
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Random Forests
    Leo Breiman
    [J]. Machine Learning, 2001, 45 : 5 - 32
  • [4] Random forests
    Pavlov, YL
    [J]. PROBABILISTIC METHODS IN DISCRETE MATHEMATICS, 1997, : 11 - 18
  • [5] Random Prism: An Alternative to Random Forests
    Stahl, Frederic
    Bramer, Max
    [J]. RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVIII: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XIX, 2011, : 5 - 18
  • [6] Compressing random microstructures via stochastic Wang tilings
    Novak, Jan
    Kucerova, Anna
    Zeman, Jan
    [J]. PHYSICAL REVIEW E, 2012, 86 (04):
  • [7] Unsupervised random forests
    Mantero, Alejandro
    Ishwaran, Hemant
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2021, 14 (02) : 144 - 167
  • [8] CONSISTENCY OF RANDOM FORESTS
    Scornet, Erwan
    Biau, Gerard
    Vert, Jean-Philippe
    [J]. ANNALS OF STATISTICS, 2015, 43 (04): : 1716 - 1741
  • [9] Extremal Random Forests
    Gnecco, Nicola
    Terefe, Edossa Merga
    Engelke, Sebastian
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [10] Joints in Random Forests
    Correia, Alvaro H. C.
    Peharz, Robert
    de Campos, Cassio
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33