Making use of functional dependencies based on data to find better classification trees

被引:0
|
作者
Sug, Hyontai [1 ]
机构
[1] Dept. of Computer Eng, Dongseo University, 47 Jurye-ro, Sasang-gu, Busan,47011, Korea, Republic of
关键词
Categorical attributes - Chi-square tests - Classification tasks - Classification trees - Functional dependency - Knowledge model - Machine learning algorithms - Novel methods - Preprocessing - Understandability;
D O I
10.46300/9106.2021.15.160
中图分类号
学科分类号
摘要
For the classification task of machine learning algorithms independency between conditional attributes is a precondition for success of data mining. On the other hand, decision trees are one of the mostly used machine learning algorithms because of their good understandability. So, because dependency between conditional attributes can cause more complex trees, supplying conditional attributes independent each other is very important, the requirement of conditional attributes for decision trees as well as other machine learning algorithms is that they are independent each other and dependent on decisional attributes only. Statistical method to check independence between attributes is Chi-square test, but the test can be effective for categorical attributes only. So, the applicability of Chi-square test is limited, because most datasets for data mining have mixed attributes of categorical and numerical. In order to overcome the problem, and as a way to test dependency between conditional attributes, a novel method based on functional dependency based on data that can be applied to any datasets irrespective of data type of attributes is suggested. After removing highly dependent attributes between conditional attributes, we can generate better decision trees. Experiments were performed to show that the method is effective, and the experiments showed very good results. © 2021, North Atlantic University Union NAUN. All rights reserved.
引用
收藏
页码:1475 / 1485
相关论文
共 50 条
  • [1] On the Existence of Armstrong Data Trees for XML Functional Dependencies
    Hartmann, Sven
    Koehler, Henning
    Trinh, Thu
    [J]. FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS, PROCEEDINGS, 2010, 5956 : 94 - +
  • [2] Making better use of data: the data based management approach
    Cook, J. G.
    [J]. CATTLE PRACTICE, 2019, 27 : 79 - 82
  • [3] Making Better Use of Monitoring Data
    Hollesen, Jorgen
    Matthiesen, Henning
    Moller, Anders Bjorn
    Hollesen, Jorgen
    Moller, Anders Bjorn
    Martens, Vibeke Vandrup
    [J]. CONSERVATION AND MANAGEMENT OF ARCHAEOLOGICAL SITES, 2016, 18 (1-3) : 116 - 125
  • [4] Measure inducing classification and regression trees for functional data
    Belli, Edoardo
    Vantini, Simone
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (05) : 553 - 569
  • [5] Semandaq: A Data Quality System Based on Conditional Functional Dependencies
    Fan, Wenfei
    Geerts, Floris
    Jia, Xibei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1460 - 1463
  • [6] Bayes Performance of Batch Data Mining Based on Functional Dependencies
    Xi, Haixu
    Ye, Feiyue
    He, Sheng
    Liu, Yijun
    Jiang, Hongfen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (03)
  • [7] Making Better Use of Unlabelled Data in Bayesian Active Learning
    Smith, Freddie Bickford
    Foster, Adam
    Rainforth, Tom
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [8] Data reporting standards: making the things we use better
    Quackenbush, John
    [J]. GENOME MEDICINE, 2009, 1
  • [9] Making better use of our brain MRI research data
    Frederik Barkhof
    [J]. European Radiology, 2012, 22 : 1395 - 1396
  • [10] Making better use of our brain MRI research data
    Barkhof, Frederik
    [J]. EUROPEAN RADIOLOGY, 2012, 22 (07) : 1395 - 1396