A data mining proxy approach for efficient frequent itemset mining

被引:2
|
作者
Yu, Jeffrey Xu [1 ]
Li, Zhiheng [1 ]
Liu, Guimei [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[2] Natl Univ Singapore, Singapore 117548, Singapore
来源
VLDB JOURNAL | 2008年 / 17卷 / 04期
关键词
Data Mining; Association Rule; Frequent Pattern; Minimum Support; Frequent Itemset;
D O I
10.1007/s00778-007-0047-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.
引用
下载
收藏
页码:947 / 970
页数:24
相关论文
共 50 条
  • [21] Anytime Frequent Itemset Mining of Transactional Data Streams
    Goyal, Poonam
    Challa, Jagat Sesh
    Shrivastava, Shivin
    Goyal, Navneet
    BIG DATA RESEARCH, 2020, 21
  • [22] Novel algorithm for frequent itemset mining in data warehouses
    Xu L.-J.
    Xie K.-L.
    Journal of Zhejiang University-SCIENCE A, 2006, 7 (2): : 216 - 224
  • [23] Parallel Incremental Frequent Itemset Mining for Large Data
    Song, Yu-Geng
    Cui, Hui-Min
    Feng, Xiao-Bing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (02) : 368 - 385
  • [24] A Survey on Closed Frequent Itemset Mining on Data Streams
    Bai, Pavitra . S.
    Kumar, Ravi . G. . K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 542 - 547
  • [25] Frequent Itemset Mining in High Dimensional Data: A Review
    Zaki, Fatimah Audah Md
    Zulkurnain, Nurul Fariza
    COMPUTATIONAL SCIENCE AND TECHNOLOGY, 2019, 481 : 325 - 334
  • [26] A Frequent and Rare Itemset Mining Approach to Transaction Clustering
    Tummala, Kuladeep
    Oswald, C.
    Sivaselvan, B.
    DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 8 - 18
  • [27] Approximate Frequent Itemset Mining for Streaming Data on FPGA
    Li, Yubin
    Sun, Yuliang
    Dai, Guohao
    Xu, Qiang
    Wang, Yu
    Yang, Huazhong
    2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [28] A new approach for mining frequent K-itemset
    Sankar, H. Ravi
    Naidu, M. M.
    WCECS 2007: WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, 2007, : 718 - +
  • [29] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
    Yamamoto, Yoshitaka
    Tabei, Yasuo
    Iwanuma, Koji
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 55 (01) : 119 - 147
  • [30] PARASOL: a hybrid approximation approach for scalable frequent itemset mining in streaming data
    Yoshitaka Yamamoto
    Yasuo Tabei
    Koji Iwanuma
    Journal of Intelligent Information Systems, 2020, 55 : 119 - 147