Evaluation and optimization of frequent, closed and maximal association rule based classification

被引:8
|
作者
Shaharanee, I. N. M. [1 ]
Hadzic, F. [2 ,3 ]
机构
[1] Univ Utara Malaysia, Sch Quantitat Sci, Sintok, Malaysia
[2] Curtin Univ, Dept Comp, Perth, WA 6845, Australia
[3] Curtin Univ Technol, Dept Comp, Bentley, WA 6102, Australia
关键词
Rule optimization; Interestingness measures; Statistical analysis; INTERESTINGNESS MEASURES; PATTERN; DISCOVERY;
D O I
10.1007/s11222-013-9404-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.
引用
收藏
页码:821 / 843
页数:23
相关论文
共 50 条
  • [1] Evaluation and optimization of frequent, closed and maximal association rule based classification
    I. N. M. Shaharanee
    F. Hadzic
    Statistics and Computing, 2014, 24 : 821 - 843
  • [2] FEATURES SELECTION AND RULE REMOVAL FOR FREQUENT ASSOCIATION RULE BASED CLASSIFICATION
    Shaharanee, Izwan Nizal Mohd
    Jamil, Jastini
    COMPUTING & INFORMATICS, 4TH INTERNATIONAL CONFERENCE, 2013, 2013, : 377 - 382
  • [3] Association Rule Classification and Regression Algorithm Based on Frequent Itemset Tree
    Wang, Ling
    Zhu, Hui
    Huang, Ruixia
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON MODELLING, SIMULATION AND APPLIED MATHEMATICS (MSAM 2018), 2018, 160 : 133 - 139
  • [4] A Classification Algorithm based on an Association Rule of Multiple Frequent Item-sets
    Liang, ZhiHeng
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 3, PROCEEDINGS, 2009, : 278 - 282
  • [5] Associative classification based on closed frequent itemsets
    Li, X.-M., 1600, Univ. of Electronic Science and Technology of China (41):
  • [6] ACCF: Associative Classification Based on Closed Frequent Itemsets
    Li, Xueming
    Qin, Dongxia
    Yu, Cun
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 380 - 384
  • [7] A novel process-based association rule approach through maximal frequent itemsets for big data processing
    Liu, Zelei
    Hu, Liang
    Wu, Chunyi
    Ding, Yan
    Wen, Quangang
    Zhao, Jia
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 414 - 424
  • [8] The study of algorithm for association rule based in the frequent pattern
    Huang, JH
    Chen, ZW
    Fang, SF
    Shi, Y
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 1620 - 1624
  • [9] Mining maximal and closed frequent free subtrees
    Guo, Ping
    Hou, Yang Z.
    Zhuang, Jun
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2007, 14 : 198 - 205
  • [10] Maximal Frequent Sequences for Document Classification
    Hai Nguyen Thi Tuyet
    Tan Hanh
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2016, : 152 - 157