A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization

被引:0
|
作者
Haddouchi Maissae
Berrado Abdelaziz
机构
[1] Ecole Mohammadia d’Ingénieurs (EMI),Mohammed V University in Rabat
关键词
Multivariate discretization; Data preprocessing; Tree ensemble; Moment matching; Random forest; Split points selection;
D O I
暂无
中图分类号
学科分类号
摘要
This paper introduces ForestDisc, an optimized, supervised, multivariate, and nonparametric discretization algorithm based on tree ensemble learning and moment matching optimization. At its core, ForestDisc uses, for each continuous attribute in the data space, moment matching to elect popular split points based on those generated while constructing a random forest model. An extensive empirical study involving 50 benchmark datasets and six classification algorithms reveals that ForestDisc is highly competitive compared with 20 major discretizers based on both intrinsic and extrinsic performance measures. The intrinsic metrics include the number of resulting bins per variable and the execution time necessary for discretizing an attribute. The extrinsic metrics concern the performance of the discretizers when applied as a preprocessing step to classification tasks, and include accuracy, F1, and Kappa measures. ForestDisc discretizer also enables an excellent trade-off between intrinsic and extrinsic performance measures.
引用
收藏
页码:45 / 63
页数:18
相关论文
共 50 条
  • [1] A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization
    Maissae, Haddouchi
    Abdelaziz, Berrado
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (01) : 45 - 63
  • [2] A new approach for discretizing continuous attributes in learning systems
    Yan, Deqin
    Liu, Deshan
    Sang, Yu
    NEUROCOMPUTING, 2014, 133 : 507 - 511
  • [3] Clustering based algorithm for best discretizing continuous valued attributes
    2000, Shenyang Inst Comput Technol, China (21):
  • [4] Tree-Based Kernel for Graphs With Continuous Attributes
    Martino, Giovanni Da San
    Navarin, Nicolo
    Sperduti, Alessandro
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 3270 - 3276
  • [5] Towards precision medicine based on a continuous deep learning optimization and ensemble approach
    Li, Jian
    Jin, Linyuan
    Wang, Zhiyuan
    Peng, Qinghai
    Wang, Yueai
    Luo, Jia
    Zhou, Jiawei
    Cao, Yingying
    Zhang, Yanfen
    Zhang, Min
    Qiu, Yuewen
    Hu, Qiang
    Chen, Liyun
    Yu, Xiaoyu
    Zhou, Xiaohui
    Li, Qiong
    Zhou, Shu
    Huang, Si
    Luo, Dan
    Mao, Xingxing
    Yu, Yi
    Yang, Xiaomeng
    Pan, Chiling
    Li, Hongxin
    Wang, Jingchao
    Liao, Jieke
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [6] Towards precision medicine based on a continuous deep learning optimization and ensemble approach
    Jian Li
    Linyuan Jin
    Zhiyuan Wang
    Qinghai Peng
    Yueai Wang
    Jia Luo
    Jiawei Zhou
    Yingying Cao
    Yanfen Zhang
    Min Zhang
    Yuewen Qiu
    Qiang Hu
    Liyun Chen
    Xiaoyu Yu
    Xiaohui Zhou
    Qiong Li
    Shu Zhou
    Si Huang
    Dan Luo
    Xingxing Mao
    Yi Yu
    Xiaomeng Yang
    Chiling Pan
    Hongxin Li
    Jingchao Wang
    Jieke Liao
    npj Digital Medicine, 6
  • [7] A moment matching approach to log-normal portfolio optimization
    Çetinkaya E.
    Thiele A.
    Computational Management Science, 2016, 13 (4) : 501 - 520
  • [8] Wireless Service Attributes Classification and Matching Mechanism Based on Decision Tree
    Peng, Min
    Yang, Laurence T.
    Zhao, Wuqing
    Xiong, Naixue
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 12 - +
  • [9] Segment Based Decision Tree Induction With Continuous Valued Attributes
    Wang, Ran
    Kwong, Sam
    Wang, Xi-Zhao
    Jiang, Qingshan
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) : 1262 - 1275
  • [10] Generalized Gaussian moment thermostatting: A new continuous dynamical approach to the canonical ensemble
    Liu, Y
    Tuckerman, ME
    JOURNAL OF CHEMICAL PHYSICS, 2000, 112 (04): : 1685 - 1700