Mining Frequent Itemsets in Correlated Uncertain Databases

被引:0
|
作者
Yong-Xin Tong
Lei Chen
Jieying She
机构
[1] Beihang University,State Key Laboratory of Software Development Environment, School of Computer Science and Engineering
[2] The Hong Kong University of Science and Technology,Department of Computer Science and Engineering
关键词
correlation; uncertain data; probabilistic frequent itemset;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, with the growing popularity of Internet of Things (IoT) and pervasive computing, a large amount of uncertain data, e.g., RFID data, sensor data, real-time video data, has been collected. As one of the most fundamental issues of uncertain data mining, uncertain frequent pattern mining has attracted much attention in database and data mining communities. Although there have been some solutions for uncertain frequent pattern mining, most of them assume that the data is independent, which is not true in most real-world scenarios. Therefore, current methods that are based on the independent assumption may generate inaccurate results for correlated uncertain data. In this paper, we focus on the problem of mining frequent itemsets over correlated uncertain data, where correlation can exist in any pair of uncertain data objects (transactions). We propose a novel probabilistic model, called Correlated Frequent Probability model (CFP model) to represent the probability distribution of support in a given correlated uncertain dataset. Based on the distribution of support derived from the CFP model, we observe that some probabilistic frequent itemsets are only frequent in several transactions with high positive correlation. In particular, the itemsets, which are global probabilistic frequent, have more significance in eliminating the influence of the existing noise and correlation in data. In order to reduce redundant frequent itemsets, we further propose a new type of patterns, called global probabilistic frequent itemsets, to identify itemsets that are always frequent in each group of transactions if the whole correlated uncertain database is divided into disjoint groups based on their correlation. To speed up the mining process, we also design a dynamic programming solution, as well as two pruning and bounding techniques. Extensive experiments on both real and synthetic datasets verify the effectiveness and efficiency of the proposed model and algorithms.
引用
收藏
页码:696 / 712
页数:16
相关论文
共 50 条
  • [21] Mining Frequent Gradual Itemsets from Large Databases
    Di-Jorio, Lisa
    Laurent, Anne
    Teisseire, Maguelonne
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 297 - +
  • [22] Mining Maximal Frequent Itemsets over Sampling Databases
    Li, Haifeng
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION (IFEEA 2015), 2016, 54 : 28 - 31
  • [23] Mining of Frequent Itemsets from Streams of Uncertain Data
    Leung, Carson Kai-Sang
    Hao, Boyu
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1663 - 1670
  • [24] Mining Closed High Utility Itemsets in Uncertain Databases
    Nguyen Bui
    Bay Vo
    Van-Nam Huynh
    Lin, Chun-Wei
    Nguyen, Loan T. T.
    [J]. PROCEEDINGS OF THE SEVENTH SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2016), 2016, : 7 - 14
  • [25] Mining High Utility Itemsets over Uncertain Databases
    Lan, Yuqing
    Wang, Yang
    Wang, Yanni
    Yi, Shengwei
    Yu, Dan
    [J]. 2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 235 - 238
  • [26] Mining frequent closed itemsets in large databases by hierarchical partitioning
    Tseng, Fan-Chen
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1832 - 1837
  • [27] Mining frequent weighted utility itemsets in hierarchical quantitative databases
    Nguyen, Ham
    Le, Tuong
    Nguyen, Minh
    Fournier-Viger, Philippe
    Tseng, Vincent S. S.
    Vo, Bay
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 237
  • [28] Mining frequent itemsets in large databases: The hierarchical partitioning approach
    Tseng, Fan-Chen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) : 1654 - 1661
  • [29] Mining maximal frequent itemsets for large scale transaction databases
    Xia, R
    Yuan, W
    Ding, SC
    Liu, J
    Zhou, HB
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1480 - 1485
  • [30] Probabilistic Frequent Itemset Mining in Uncertain Databases
    Bernecker, Thomas
    Kriegel, Hans-Peter
    Renz, Matthias
    Verhein, Florian
    Zuefle, Andreas
    [J]. KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 119 - 127