Performance Improvement Validation of Decision Tree Algorithms with Non-normalized Information Distance in Experiments

被引:0
|
作者
Araki, Takeru [1 ]
Luo, Yuan [1 ]
Guo, Minyi [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
Decision tree; ID3; algorithm; Information distance; Information gain; Gain ratio;
D O I
10.1007/978-3-031-20862-1_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of ID3 algorithm in decision tree depends on the information gain but it has a drawback because of tending to select attributes with many values as the branching attributes. The gain ratio (especially in C4.5) is proposed to improve the information gain, but it does not always improve the performance, nor is it always defined. Some scientists use normalized information distance to improve the gain ratio, however, it is ineffective. In this paper, we investigate two non-normalized information distance selection criteria to replace the information gain and the gain ratio and conduct detailed experiments on 13 datasets classified into four types with theoretical analysis. Surprisingly, on the datasets where the number of values of each attribute differ greatly i.e. in Type1 and Type2, non-normalized information distance-based algorithms can increase the accuracy of about 15-25% of ID3 algorithm. The first reason is that more values for an attribute does not reduce the distances, which is suggested by Mantaras. The second reason is that the conditional entropy which is the opposite one used in the information gain can bring balance to the multi-valued biased values. Furthermore, our methods can maintain results comparable to those of existing algorithms on other cases. Compared to the gain ratio, the algorithms with non-normalized information distances conquer the drawback much better on Type1 datasets, which is strongly confirmed by experiments and corresponding analysis. It can be presumed that "normalization" improvement methods such as normalized information distance and the gain ratio are not always effective.
引用
收藏
页码:450 / 464
页数:15
相关论文
共 50 条
  • [31] Performance Improvement of C4.5 Algorithm using Difference Values Nodes in Decision Tree
    Nugroho, Handoyo Widi
    Adji, Teguh Bharata
    Setiawan, Noor Akhmad
    2018 6TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT (CITSM), 2018, : 334 - 339
  • [32] Performance analysis of advanced decision tree-based ensemble learning algorithms for landslide susceptibility mapping
    Sahin, Emrehan Kutlug
    Colkesen, Ismail
    GEOCARTO INTERNATIONAL, 2021, 36 (11) : 1253 - 1275
  • [33] Discharge performance of a submerged seawater intake in unsteady flows: Combination of physical models and decision tree algorithms
    Firozjaei, Mahmood Rahmani
    Hajebi, Zahra
    Naeeni, Seyed Taghi Omid
    Akbari, Hassan
    JOURNAL OF WATER PROCESS ENGINEERING, 2024, 60
  • [34] Exploiting clustering and decision-tree algorithms to mine LTL assertions containing non-boolean expressions
    Germiniani, Samuele
    Pravadelli, Graziano
    PROCEEDINGS OF THE 2022 IFIP/IEEE 30TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2022,
  • [35] Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems
    Bektas, Jale
    Ibrikci, Turgay
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2022, 25 (05) : 523 - 531
  • [36] Learning Optimization for Decision Tree Classification of Non-categorical Data with Information Gain Impurity Criterion
    Sofeikov, K. I.
    Tyukin, I. Yu.
    Gorban, A. N.
    Mirkes, E. M.
    Prokhorov, D. V.
    Romanenko, I. V.
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 3548 - 3555
  • [37] Performance improvement of direction finding algorithms in non-homogeneous environment through data fusion
    Cherchar, Afnmar
    Thameri, Messaoud
    Belouchrani, Adel
    DIGITAL SIGNAL PROCESSING, 2015, 41 : 41 - 47
  • [38] Assessment of associations between transition diseases and reproductive performance of dairy cows using survival analysis and decision tree algorithms
    Pascottini, O. Bogado
    Probo, M.
    LeBlanc, S. J.
    Opsomer, G.
    Hostens, M.
    PREVENTIVE VETERINARY MEDICINE, 2020, 176
  • [39] Comparison Study on Classification Performance for Short-term Urban Traffic Flow Condition Using Decision Tree Algorithms
    Wang, Jiao-Jiao
    Wang, Jin-Feng
    Lu, Feng
    Cao, Zhi-Dong
    Liao, Yi-Lan
    Deng, Yu
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 4, PROCEEDINGS, 2009, : 434 - +
  • [40] A Comprehensive Decision-Making Approach Based on Hierarchical Attribute Model for Information Fusion Algorithms' Performance Evaluation
    Li, Lianhui
    Mo, Rong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014