Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引:3
|
作者
Clark, Patrick G. [1 ]
Gao, Cheng [1 ]
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
Niemiec, Rafal [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;
D O I
10.1093/jigpal/jzaa041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
引用
收藏
页码:124 / 137
页数:14
相关论文
共 50 条
  • [1] Complexity of Rule Sets in Mining Incomplete Data Using Characteristic Sets and Generalized Maximal Consistent Blocks
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    Niemiec, Rafal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 84 - 94
  • [2] Characteristic Sets and Generalized Maximal Consistent Blocks in Mining Incomplete Data
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    ROUGH SETS, 2017, 10313 : 477 - 486
  • [3] Characteristic sets and generalized maximal consistent blocks in mining incomplete data
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    INFORMATION SCIENCES, 2018, 453 : 66 - 79
  • [4] A Comparison of Characteristic Sets and Generalized Maximal Consistent Blocks in Mining Incomplete Data
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, PT II, 2018, 854 : 480 - 489
  • [5] Complexity of Rule Sets Induced by Characteristic Sets and Generalized Maximal Consistent Blocks
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    Niemiec, Rafal
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2018), PT II, 2018, 10842 : 301 - 310
  • [6] Mining Incomplete Data Using Global and Saturated Probabilistic Approximations Based on Characteristic Sets and Maximal Consistent Blocks
    Clark, Patrick G.
    Grzymala-Busse, Jerzy W.
    Hippe, Zdzislaw S.
    Mroczek, Teresa
    ROUGH SETS (IJCRS 2021), 2021, 12872 : 3 - 17
  • [7] Mining incomplete data using global and saturated probabilistic approximations based on characteristic sets and maximal consistent blocks
    Clark, Patrick G.
    Grzymala-Busse, Jerzy W.
    Hippe, Zdzislaw S.
    Mroczek, Teresa
    INFORMATION SCIENCES, 2024, 662
  • [8] Complexity of Rule Sets Induced from Incomplete Data Sets Using Global Probabilistic Approximations
    Clark, Patrick G.
    Grzymala-Busse, Jerzy W.
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, PT I, 2014, 442 : 386 - 395
  • [9] Handling of incomplete data sets using ICA and SOM in data mining
    Hongyi Peng
    Siming Zhu
    Neural Computing and Applications, 2007, 16 : 167 - 172
  • [10] Handling of incomplete data sets using ICA and SOM in data mining
    Peng, Hongyi
    Zhu, Siming
    NEURAL COMPUTING & APPLICATIONS, 2007, 16 (02): : 167 - 172