Evaluation and optimization of frequent, closed and maximal association rule based classification

被引:8
|
作者
Shaharanee, I. N. M. [1 ]
Hadzic, F. [2 ,3 ]
机构
[1] Univ Utara Malaysia, Sch Quantitat Sci, Sintok, Malaysia
[2] Curtin Univ, Dept Comp, Perth, WA 6845, Australia
[3] Curtin Univ Technol, Dept Comp, Bentley, WA 6102, Australia
关键词
Rule optimization; Interestingness measures; Statistical analysis; INTERESTINGNESS MEASURES; PATTERN; DISCOVERY;
D O I
10.1007/s11222-013-9404-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real world applications of association rule mining have well-known problems of discovering a large number of rules, many of which are not interesting or useful for the application at hand. The algorithms for closed and maximal itemsets mining significantly reduce the volume of rules discovered and complexity associated with the task, but the implications of their use and important differences with respect to the generalization power, precision and recall when used in the classification problem have not been examined. In this paper, we present a systematic evaluation of the association rules discovered from frequent, closed and maximal itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriate sequence of usage. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided as a whole and w.r.t individual classes. Empirical results confirm that with a proper combination of data mining and statistical analysis, a large number of non-significant, redundant and contradictive rules can be eliminated while preserving relatively high precision and recall. More importantly, the results reveal the important characteristics and differences between using frequent, closed and maximal itemsets for the classification task, and the effect of incorporating statistical/heuristic measures for optimizing such rule sets. With closed itemset mining already being a preferred choice for complexity and redundancy reduction during rule generation, this study has further confirmed that overall closed itemset based association rules are also of better quality in terms of classification precision and recall, and precision and recall on individual class examples. On the other hand maximal itemset based association rules, that are a subset of closed itemset based rules, show to be insufficient in this regard, and typically will have worse recall and generalization power. Empirical results also show the downfall of using the confidence measure at the start to generate association rules, as typically done within the association rule framework. Removing rules that occur below a certain confidence threshold, will also remove the knowledge of existence of any contradictions in the data to the relatively higher confidence rules, and thus precision can be increased by disregarding contradictive rules prior to application of confidence constraint.
引用
收藏
页码:821 / 843
页数:23
相关论文
共 50 条
  • [11] A SAT-Based Approach for Discovering Frequent, Closed and Maximal Patterns in a Sequence
    Coquery, Emmanuel
    Jabbour, Said
    Sais, Lakhdar
    Salhi, Yakoub
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 258 - +
  • [12] Association Rule Based Frequent Pattern Mining in Biological Sequences
    Salim, A.
    Chandra, Vinod S. S.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 393 - 397
  • [13] Mining closed and maximal frequent induced free subtrees
    Shiozaki, Hitohiro
    Ozaki, Tomonobu
    Ohkawa, Takenao
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 14 - +
  • [14] CMTreeMiner: Mining both closed and maximal frequent subtrees
    Chi, Y
    Yang, YR
    Xia, Y
    Muntz, RR
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 63 - 73
  • [15] Frequent Itemset Generation Using Association Rule Mining Based on Hybrid Neural Network Based Billiard Inspired Optimization
    Lakshmi, N.
    Krishnamurthy, M.
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (08)
  • [16] Packer classification based on association rule mining
    Dam, Khanh Huu The
    -Wilson, Thomas Given
    Legay, Axel
    Veroneze, Rosana
    APPLIED SOFT COMPUTING, 2022, 127
  • [17] A greedy classification algorithm based on association rule
    Thabtah, F. A.
    Cowling, P. I.
    APPLIED SOFT COMPUTING, 2007, 7 (03) : 1102 - 1111
  • [18] Rule-Based Error Classification for Analyzing Differences in Frequent Errors
    Shirafuji, Atsushi
    Matsumoto, Taku
    Amin, Md Faizul Ibne
    Watanobe, Yutaka
    2023 IEEE INTERNATIONAL CONFERENCE ON TEACHING, ASSESSMENT AND LEARNING FOR ENGINEERING, TALE, 2023, : 588 - 594
  • [19] Incremental Closed Frequent Itemsets Mining-Based Approach Using Maximal Candidates
    Al-Zeiadi, Mohammed A.
    Al-Maqaleh, Basheer M.
    IEEE ACCESS, 2025, 13 : 34023 - 34037
  • [20] Novel graph classification approach based on frequent closed emerging patterns
    School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    Jisuanji Yanjiu yu Fazhan, 2007, 7 (1169-1176):