Mining significant association rules from uncertain data

被引:8
|
作者
Zhang, Anshu [1 ]
Shi, Wenzhong [1 ]
Webb, Geoffrey I. [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Land Surveying & Geoinformat, Kowloon, Hong Kong, Peoples R China
[2] Monash Univ, Fac Informat Technol, Melbourne, Vic 3800, Australia
关键词
Pattern discovery; Association rules; Statistical evaluation; Uncertain data; ACCURACY;
D O I
10.1007/s10618-015-0446-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.
引用
收藏
页码:928 / 963
页数:36
相关论文
共 50 条
  • [21] Study on the association rules of data mining
    Li, YR
    [J]. ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 459 - 462
  • [22] Data mining in law with association rules
    Stranieri, A
    Zeleznikow, J
    Turner, H
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON LAW AND TECHNOLOGY, 2000, : 129 - 134
  • [23] Association rules mining of image data
    Shu, Feng-Di
    Wu, Guo-Qing
    Wang, Min
    [J]. Xiaoxing Weixing Jisuanji Xitong/Mini-Micro Systems, 2001, 22 (11):
  • [24] Data mining for ranged association rules
    Lee, DP
    Yang, SP
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL I AND II, 1999, : 32 - 37
  • [25] On data partitions for mining association rules
    Han, JL
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1176 - 1182
  • [26] Pruning association rules in data mining
    Qin, Min
    Li, Zhi-Zhu
    [J]. 2001, Shanghai Jiao Tong University (35):
  • [27] Mining Association Rules from Empirical Data in the Domain of Education
    Radosav, Dragica
    Brtka, Eleonora
    Brtka, Vladimir
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2012, 7 (05) : 933 - 944
  • [28] GENMINER : Mining informative association rules from genomic data
    Martinez, Ricardo
    Pasquier, Claude
    Pasquier, Nicolas
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2007, : 15 - +
  • [29] Mining fuzzy similar association rules from quantitative data
    Wang, SL
    Kuo, CY
    Hong, TP
    [J]. 2002 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY PROCEEDINGS, 2002, : 190 - 194
  • [30] Mining association rules from distorted data for privacy preservation
    Zhang, P
    Tong, YH
    Tang, SW
    Yang, DQ
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1345 - 1351