Efficient Mining of Frequent Item Sets on Large Uncertain Databases

被引:55
|
作者
Wang, Liang [1 ]
Cheung, David Wai-Lok [1 ]
Cheng, Reynold [1 ]
Lee, Sau Dan [1 ]
Yang, Xuan S. [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
Frequent item sets; uncertain data set; approximate algorithm; incremental mining;
D O I
10.1109/TKDE.2011.165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The data handled in emerging applications like location-based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, we study the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS). This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, we develop an approximate algorithm, which can efficiently and accurately discover frequent item sets in a large uncertain database. We also study the important issue of maintaining the mining result for a database that is evolving (e. g., by inserting a tuple). Specifically, we propose incremental mining algorithms, which enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. We examine how an existing algorithm that extracts exact item sets, as well as our approximate algorithm, can support incremental mining. All our approaches support both tuple and attribute uncertainty, which are two common uncertain database models. We also perform extensive evaluation on real and synthetic data sets to validate our approaches.
引用
收藏
页码:2170 / 2183
页数:14
相关论文
共 50 条
  • [1] Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases
    Lin, Jerry Chun-Wei
    Gan, Wensheng
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION (MLDM 2016), 2016, 9729 : 236 - 250
  • [2] On Efficient Mining of Frequent Itemsets from Big Uncertain Databases
    Shah, Ahsan
    Halim, Zahid
    [J]. JOURNAL OF GRID COMPUTING, 2019, 17 (04) : 831 - 850
  • [3] Efficient weighted probabilistic frequent itemset mining in uncertain databases
    Li, Zhiyang
    Chen, Fengjuan
    Wu, Junfeng
    Liu, Zhaobin
    Liu, Weijiang
    [J]. EXPERT SYSTEMS, 2021, 38 (05)
  • [4] On Efficient Mining of Frequent Itemsets from Big Uncertain Databases
    Ahsan Shah
    Zahid Halim
    [J]. Journal of Grid Computing, 2019, 17 : 831 - 850
  • [5] Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases
    Zhao, Zhou
    Yan, Da
    Ng, Wilfred
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1171 - 1184
  • [6] An Efficient Algorithm for Mining Large Item Sets
    Zheng, Hong-Zhen
    Chu, Dian-Hui
    Zhan, De-Chen
    Xu, Xiao-Fei
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 561 - 564
  • [7] An efficient algorithm for mining large item sets
    Zheng, Hong-Zhen
    Chu, Dian-Hui
    Zhan, De-Chen
    [J]. 3RD INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS, AND APPLICAT/4TH INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 2, 2006, : 151 - +
  • [8] An Efficient Approach for Mining Frequent Item sets with Transaction Deletion Operation
    Bay Vo
    Thien-Phuong Le
    Tzung-Pei Hong
    Bac Le
    Jung, Jason
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (05) : 595 - 602
  • [9] Mining Frequent Itemsets in Correlated Uncertain Databases
    Yong-Xin Tong
    Lei Chen
    Jieying She
    [J]. Journal of Computer Science and Technology, 2015, 30 : 696 - 712
  • [10] Mining Frequent Itemsets over Uncertain Databases
    Tong, Yongxin
    Chen, Lei
    Cheng, Yurong
    Yu, Philip S.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11): : 1650 - 1661