Increasing Data Set Incompleteness May Improve Rule Set Quality

被引:0
|
作者
Grzymala-Busse, Jerzy W. [1 ,2 ]
Grzymala-Busse, Witold J. [3 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Polish Acad Sci, Inst Comp Sci, PL-01237 Warsaw, Poland
[3] Touchnet Informat Syst Inc, Lenexa, KS 66129 USA
来源
关键词
Rough set theory; Rule induction; MLEM2; algorithm; Missing attribute values; Lost values; Attribute-concept values; do not care" conditions; ROUGH; APPROXIMATIONS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new methodology to improve the quality of rule sets. We performed a series of data mining experiments on completely specified data sets. In these experiments we removed some specified attribute values, or, in different words, replaced such specified values by symbols of missing attribute values, and used these data for rule induction while original, complete data sets were used for testing. In our experiments we used the MLEM2 rule induction algorithm of the LERS data mining system, based on rough sets. Our approach to missing attribute values was based on rough set theory as well. Results of our experiments show that for some data sets and some interpretation of missing attribute values, the error rate was smaller than for the original, complete data sets. Thus, rule sets induced from some data sets may be improved by increasing incompleteness of data sets. It appears that by removing some attribute values, the rule induction system, forced to induce rules from remaining information, may induce better rule sets.
引用
收藏
页码:200 / +
页数:3
相关论文
共 50 条
  • [1] Improving quality of rule sets by increasing incompleteness of data sets - A rough set approach
    Grzymala-Busse, Jerzy W.
    Grzymala-Busse, Witold J.
    ICSOFT 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES, VOL PL/DPS/KE, 2008, : 241 - +
  • [3] Abstaining in rule set bagging for imbalanced data
    Napierala, Krystyna
    Stefanowski, Jerzy
    LOGIC JOURNAL OF THE IGPL, 2015, 23 (03) : 421 - 430
  • [4] An Optimal Rule Set Generation Algorithm for Uncertain Data
    Surekha, S.
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2017, 2018, 668 : 133 - 146
  • [5] Hybrid Classifier for Increasing Accuracy of Fitness Data Set
    Lal, Abhishek
    Kumar, C. R. S.
    2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 1246 - 1249
  • [6] Learning algorithms may perform worse with increasing training set size: Algorithm-data incompatibility
    Yousef, Waleed A.
    Kundu, Subrata
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 74 : 181 - 197
  • [7] Incompleteness of Arithmetic from the Viewpoint of Diophantine Set Theory
    Gupal, A. M.
    Vagis, O. A.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2023, 59 (5) : 698 - 703
  • [8] Incompleteness of Arithmetic from the Viewpoint of Diophantine Set Theory
    A. M. Gupal
    O. A. Vagis
    Cybernetics and Systems Analysis, 2023, 59 : 698 - 703
  • [9] The Minimum Data Set: An Opportunity to Improve Spasticity Screening
    Tomaras, Miranda C.
    Simmons, Sandra F.
    Schnelle, Jack F.
    Charles, David
    Hacker, Mallory L.
    JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION, 2021, 22 (03) : 608 - 612
  • [10] Association rule mining algorithms for set-valued data
    Shoemaker, CA
    Ruiz, C
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 669 - 676