A data mining proxy approach for efficient frequent itemset mining

被引:2
|
作者
Yu, Jeffrey Xu [1 ]
Li, Zhiheng [1 ]
Liu, Guimei [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[2] Natl Univ Singapore, Singapore 117548, Singapore
来源
VLDB JOURNAL | 2008年 / 17卷 / 04期
关键词
Data Mining; Association Rule; Frequent Pattern; Minimum Support; Frequent Itemset;
D O I
10.1007/s00778-007-0047-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.
引用
收藏
页码:947 / 970
页数:24
相关论文
共 50 条
  • [41] An efficient approach to mining frequent itemsets on data streams
    Ansari, Sara
    Sadreddini, Mohammad Hadi
    World Academy of Science, Engineering and Technology, 2009, 37 : 489 - 495
  • [42] A Survey Paper on Frequent Itemset Mining
    Sastry, J. S. V. R. S.
    Suresh, V
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND MACHINE LEARNING, 2019, 1228
  • [43] Frequent Itemset Mining in Multirelational Databases
    Jimenez, Aida
    Berzal, Fernando
    Cubero, Juan-Carlos
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 15 - 24
  • [44] Verified Programs for Frequent Itemset Mining
    Loulergue, Frederic
    Whitney, Christopher D.
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1516 - 1523
  • [45] Oracle and Vertica for Frequent Itemset Mining
    Kyurkchiev, Hristo
    Kaloyanova, Kalinka
    DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 77 - 85
  • [46] A primer to frequent itemset mining for bioinformatics
    Naulaerts, Stefan
    Meysman, Pieter
    Bittremieux, Wout
    Trung Nghia Vu
    Vanden Berghe, Wim
    Goethals, Bart
    Laukens, Kris
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) : 216 - 231
  • [47] A parallel algorithm for frequent itemset mining
    Li, L
    Zhai, DH
    Fan, J
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 868 - 871
  • [48] Frequent itemset mining with parallel RDBMS
    Shang, XQ
    Sattler, KU
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 539 - 544
  • [49] Frequent closed informative itemset mining
    Fu, Huaiguo
    Foghlu, Micheal O.
    Donnelly, Willie
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 232 - +
  • [50] Video mining with frequent itemset configurations
    Quack, Till
    Ferrari, Vittorio
    Van Gool, Luc
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 360 - 369