Efficient Mining of Frequent Item Sets on Large Uncertain Databases

被引:55
|
作者
Wang, Liang [1 ]
Cheung, David Wai-Lok [1 ]
Cheng, Reynold [1 ]
Lee, Sau Dan [1 ]
Yang, Xuan S. [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
Frequent item sets; uncertain data set; approximate algorithm; incremental mining;
D O I
10.1109/TKDE.2011.165
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The data handled in emerging applications like location-based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, we study the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS). This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, we develop an approximate algorithm, which can efficiently and accurately discover frequent item sets in a large uncertain database. We also study the important issue of maintaining the mining result for a database that is evolving (e. g., by inserting a tuple). Specifically, we propose incremental mining algorithms, which enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. We examine how an existing algorithm that extracts exact item sets, as well as our approximate algorithm, can support incremental mining. All our approaches support both tuple and attribute uncertainty, which are two common uncertain database models. We also perform extensive evaluation on real and synthetic data sets to validate our approaches.
引用
收藏
页码:2170 / 2183
页数:14
相关论文
共 50 条
  • [41] Efficient dynamic mining of constrained frequent sets
    Lakshmanan, LVS
    Leung, CKS
    Ng, RT
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2003, 28 (04): : 337 - 389
  • [42] Distributed Mining of Constrained Frequent Sets from Uncertain Data
    Cuzzocrea, Alfredo
    Leung, Carson K.
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT I: ICA3PP 2011, 2011, 7916 : 40 - +
  • [43] Mining Frequent Gradual Itemsets from Large Databases
    Di-Jorio, Lisa
    Laurent, Anne
    Teisseire, Maguelonne
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 297 - +
  • [44] Parallel and Distributed Frequent Pattern Mining in Large Databases
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    [J]. HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 407 - 414
  • [45] Probabilistic Frequent Itemset Mining Algorithm over Uncertain Databases with Sampling
    Li, Hai-Feng
    Zhang, Ning
    Zhang, Yue-Jin
    Wang, Yue
    [J]. FUZZY SYSTEMS AND DATA MINING II, 2016, 293 : 159 - 166
  • [46] Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases
    Lin, Jerry Chun-Wei
    Gan, Wensheng
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    Chao, Han-Chieh
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2017, 16 (06) : 1549 - 1579
  • [47] Mining top-k frequent patterns from uncertain databases
    Tuong Le
    Bay Vo
    Van-Nam Huynh
    Ngoc Thanh Nguyen
    Baik, Sung Wook
    [J]. APPLIED INTELLIGENCE, 2020, 50 (05) : 1487 - 1497
  • [48] Probabilistic maximal frequent itemset mining methods over uncertain databases
    Li, Haifeng
    Hai, Mo
    Zhang, Ning
    Zhu, Jianming
    Wang, Yue
    Cao, Huaihu
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1219 - 1241
  • [49] Mining top-k frequent patterns from uncertain databases
    Tuong Le
    Bay Vo
    Van-Nam Huynh
    Ngoc Thanh Nguyen
    Sung Wook Baik
    [J]. Applied Intelligence, 2020, 50 : 1487 - 1497
  • [50] DWMiner: A tool for mining frequent item sets efficiently in data warehouses
    Almentero, Bruno Kinder
    Evsukoff, Alexandre Goncalves
    Mattoso, Marta
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2006, 2007, 4395 : 212 - +