Performance Improvement Validation of Decision Tree Algorithms with Non-normalized Information Distance in Experiments

被引:0
|
作者
Araki, Takeru [1 ]
Luo, Yuan [1 ]
Guo, Minyi [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
Decision tree; ID3; algorithm; Information distance; Information gain; Gain ratio;
D O I
10.1007/978-3-031-20862-1_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of ID3 algorithm in decision tree depends on the information gain but it has a drawback because of tending to select attributes with many values as the branching attributes. The gain ratio (especially in C4.5) is proposed to improve the information gain, but it does not always improve the performance, nor is it always defined. Some scientists use normalized information distance to improve the gain ratio, however, it is ineffective. In this paper, we investigate two non-normalized information distance selection criteria to replace the information gain and the gain ratio and conduct detailed experiments on 13 datasets classified into four types with theoretical analysis. Surprisingly, on the datasets where the number of values of each attribute differ greatly i.e. in Type1 and Type2, non-normalized information distance-based algorithms can increase the accuracy of about 15-25% of ID3 algorithm. The first reason is that more values for an attribute does not reduce the distances, which is suggested by Mantaras. The second reason is that the conditional entropy which is the opposite one used in the information gain can bring balance to the multi-valued biased values. Furthermore, our methods can maintain results comparable to those of existing algorithms on other cases. Compared to the gain ratio, the algorithms with non-normalized information distances conquer the drawback much better on Type1 datasets, which is strongly confirmed by experiments and corresponding analysis. It can be presumed that "normalization" improvement methods such as normalized information distance and the gain ratio are not always effective.
引用
收藏
页码:450 / 464
页数:15
相关论文
共 50 条
  • [1] Information criteria for non-normalized models
    Matsuda, Takeru
    Uehara, Masatoshi
    Hyvärinen, Aapo
    Journal of Machine Learning Research, 2021, 22
  • [2] Information criteria for non-normalized models
    Matsuda, Takeru
    Uehara, Masatoshi
    Hyvarinen, Aapo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [3] Performance of intensity-based non-normalized pointwise algorithms in dynamic speckle analysis
    Stoykova, E.
    Nazarova, D.
    Berberova, N.
    Gotchev, A.
    OPTICS EXPRESS, 2015, 23 (19): : 25128 - 25142
  • [4] Extending arc-consistency algorithms for Non-Normalized CSPs
    Arangu, Marlene
    Salido, Miguel A.
    Barber, Federico
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 311 - 316
  • [5] Minimum Lq-distance estimators for non-normalized parametric models
    Betsch, Steffen
    Ebner, Bruno
    Klar, Bernhard
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (02): : 514 - 548
  • [6] Efficient algorithms for decision tree cross-validation
    Blockeel, H
    Struyf, J
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 621 - 650
  • [7] Distance-Based Decision Tree Algorithms for Label Ranking
    de Sa, Claudio Rebelo
    Rebelo, Carla
    Soares, Carlos
    Knobbe, Arno
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 525 - 534
  • [8] The Improvement of Decision Tree Construction Algorithm Based on Quantum Heuristic Algorithms
    I. M. Mannapov
    Lobachevskii Journal of Mathematics, 2023, 44 : 724 - 732
  • [9] The Improvement of Decision Tree Construction Algorithm Based on Quantum Heuristic Algorithms
    Mannapov, I. M.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (02) : 724 - 732
  • [10] Evaluation of college admissions: a decision tree guide to provide information for improvement
    Ying-Sing Liu
    Liza Lee
    Humanities and Social Sciences Communications, 9