A hybrid evolutionary approach to construct optimal decision trees with large data sets

被引:0
|
作者
Patil, D. V. [1 ]
Bichkar, R. S. [2 ]
机构
[1] SGGS Inst Engn & Tech, Nanded, MS, India
[2] SGGS Inst Engn & Tech, Dept Elect & Telecommun Engn, Nanded, MS, India
关键词
large data sets; decision tree; genetic algorithm; genetically evolved decision Tree; training set size; and classification accuracy; Comprehensibility;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Data mining environments produces large Volume of data. The large amount of knowledge contains can be utilized to improve decision-making process of an organization. Large amount of available data when used for decision tree construction builds large sized trees that are incomprehensible to human experts. The learning process on this high volume data becomes very slow, as it has to be done serially on available large datasets. Our ultimate goal is to build smaller trees with equally accurate solutions with randomly selected sampled data. We experimented on techniques based on the idea of incremental random sampling combined with genetic algorithms that uses global search techniques to evolve decision Trees to obtain compact representation of large data set. Experiments performed on some data sets proved that the proposed random sampling procedures with genetic algorithms to build decision Trees gives relatively smaller trees as compared to other methods but equally accurate solution as other methods. The method incorporates optimization with the Comprehensibility and scalability. We tried to explore the method using that we can avoid problems like slow execution, overloading of memory and processor with very large database can be avoided using the technique.
引用
收藏
页码:603 / +
页数:2
相关论文
共 50 条
  • [21] Optimal decision trees for categorical data via integer programming
    Oktay Günlük
    Jayant Kalagnanam
    Minhan Li
    Matt Menickelly
    Katya Scheinberg
    Journal of Global Optimization, 2021, 81 : 233 - 260
  • [22] Optimal decision trees for categorical data via integer programming
    Gunluk, Oktay
    Kalagnanam, Jayant
    Li, Minhan
    Menickelly, Matt
    Scheinberg, Katya
    JOURNAL OF GLOBAL OPTIMIZATION, 2021, 81 (01) : 233 - 260
  • [23] A Scalable Two Stage Approach to Computing Optimal Decision Sets
    Ignatiev, Alexey
    Lam, Edward
    Stuckey, Peter J.
    Marques-Silva, Joao
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3806 - 3814
  • [24] Symbolic approach to classify large data sets
    de Carvalho, FDT
    Anselmo, CAD
    de Souza, RMCR
    DATA ANALYSIS, CLASSIFICATION, AND RELATED METHODS, 2000, : 375 - 380
  • [25] Evolutionary induction of a decision tree for large-scale data: a GPU-based approach
    Krzysztof Jurczuk
    Marcin Czajkowski
    Marek Kretowski
    Soft Computing, 2017, 21 : 7363 - 7379
  • [26] Evolutionary induction of a decision tree for large-scale data: a GPU-based approach
    Jurczuk, Krzysztof
    Czajkowski, Marcin
    Kretowski, Marek
    SOFT COMPUTING, 2017, 21 (24) : 7363 - 7379
  • [27] Hybrid Splitting Criterion in Decision Trees for Data Stream Mining
    Jaworski, Maciej
    Rutkowski, Leszek
    Pawlak, Miroslaw
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 : 60 - 72
  • [28] Universal trees based on large combined protein sequence data sets
    James R. Brown
    Christophe J. Douady
    Michael J. Italia
    William E. Marshall
    Michael J. Stanhope
    Nature Genetics, 2001, 28 : 281 - 285
  • [29] Universal trees based on large combined protein sequence data sets
    Brown, JR
    Douady, CJ
    Italia, MJ
    Marshall, WE
    Stanhope, MJ
    NATURE GENETICS, 2001, 28 (03) : 281 - 285
  • [30] Feature Weighted Clustering of Mixed Data Sets by Hybrid Evolutionary Algorithm
    Dutta, Dipankar
    Dutta, Paramartha
    Sil, Jaya
    2013 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2013,