Evaluation and optimization of frequent, closed and maximal association rule based classification

被引:8
|
作者
Shaharanee, I. N. M. [1 ]
Hadzic, F. [2 ,3 ]
机构
[1] Univ Utara Malaysia, Sch Quantitat Sci, Sintok, Malaysia
[2] Curtin Univ, Dept Comp, Perth, WA 6845, Australia
[3] Curtin Univ Technol, Dept Comp, Bentley, WA 6102, Australia
关键词
Rule optimization; Interestingness measures; Statistical analysis; INTERESTINGNESS MEASURES; PATTERN; DISCOVERY;
D O I
10.1007/s11222-013-9404-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.
引用
收藏
页码:821 / 843
页数:23
相关论文
共 50 条
  • [31] Classification and association rule based consultancy minor tool
    Nedunchelian, R
    Karnawat, A
    Farooqui, S
    Sundaram, A
    EISTA '04: International Conference on Education and Information Systems: Technologies and Applications, Vol, 2, Proceedings: EDUCATION AND TRAINING SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 186 - 189
  • [32] AClass: Classification algorithm based on association rule mining
    Computational Science and Engineering Department, Istanbul Technical University , Maslak 34469, Turkey
    WSEAS Trans. Inf. Sci. Appl., 2006, 3 (570-575):
  • [33] SPARC: SPatial Association Rule-based Classification
    Han, JW
    Tung, AKH
    He, J
    DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 461 - 485
  • [34] A new approach to classification based on association rule mining
    Chen, Guoqing
    Liu, Hongyan
    Yu, Lan
    Wei, Qiang
    Zhang, Xing
    DECISION SUPPORT SYSTEMS, 2006, 42 (02) : 674 - 689
  • [35] Query Classification Based on Index Association Rule Expansion
    Fu, Xianghua
    Chen, Dongjian
    Guo, Xueping
    Wang, Chao
    WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 311 - 318
  • [36] EARC: Evidential association rule-based classification
    Geng, Xiaojiao
    Liang, Yan
    Jiao, Lianmeng
    INFORMATION SCIENCES, 2021, 547 : 202 - 222
  • [37] Measures of Class Membership in Association Rule based Classification
    Phan-Luong, Viet
    2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS: WAINA, VOLS 1 AND 2, 2009, : 685 - 690
  • [38] Software Defect Prediction Based on Association Rule Classification
    Ma, Baojun
    Dejaeger, Karel
    Vanthienen, Jan
    Baesens, Bart
    ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 396 - +
  • [39] Classification Rule Mining Approach Based on Multiobjective Optimization
    Sag, Tahir
    Kahramanli, Humar
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [40] Classification rule mining based on particle swarm optimization
    Wang, Ziqiang
    Sun, Xia
    Zhang, Dexian
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 436 - 441