H-mine: Hyper-structure mining of frequent patterns in large databases

被引：174

作者：

Pei, J ^{[1
]}

Han, JW ^{[1
]}

Lu, HJ ^{[1
]}

Nishio, S ^{[1
]}

Tang, SW ^{[1
]}

Yang, DQ ^{[1
]}

机构：

[1] Peking Univ, Beijing 100871, Peoples R China

来源：

2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年

关键词：

D O I：

10.1109/ICDM.2001.989550

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs, short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyper-linked data structure, H-struct and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover, it can be scaled lip to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has high performance in various kinds of data, outperforms the previously developed algorithms in different settings, and is highly scalable in mining large databases. This study, also proposes a new data mining methodology, space-preserving mining, which may have strong impact in the future development of efficient and scalable data mining methods.

引用

页码：441 / 448

页数：8

共 50 条

[21] Mining frequent trajectory patterns in spatial-temporal databases
Lee, Anthony J. T.
Chen, Yi-An
Ip, Weng-Chong
[J]. INFORMATION SCIENCES, 2009, 179 (13) : 2218 - 2231
[22] AN EFFICIENT ITEMSET REPRESENTATION FOR MINING FREQUENT PATTERNS IN TRANSACTIONAL DATABASES
Tomovic, Savo
Stanisic, Predrag
[J]. COMPUTING AND INFORMATICS, 2018, 37 (04) : 894 - 914
[23] TidFP: Mining Frequent Patterns in Different Databases with Transaction ID
Ezeife, C. I.
Zhang, Dan
[J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 : 125 - 137
[24] Parallel and Distributed Algorithms for Frequent Pattern Mining in Large Databases
Tanbeer, Syed Khairuzzaman
Ahmed, Chowdhury Farhan
Jeong, Byeong-Soo
[J]. IETE TECHNICAL REVIEW, 2009, 26 (01) : 55 - 66
[25] Mining frequent closed itemsets in large databases by hierarchical partitioning
Tseng, Fan-Chen
[J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1832 - 1837
[26] Mining frequent approximate patterns in large networks
Driss, Kaouthar
Boulila, Wadii
Leborgne, Aurelie
Gancarski, Pierre
[J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (03) : 1265 - 1279
[27] Efficient Mining of Frequent Item Sets on Large Uncertain Databases
Wang, Liang
Cheung, David Wai-Lok
Cheng, Reynold
Lee, Sau Dan
Yang, Xuan S.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (12) : 2170 - 2183
[28] Mining frequent itemsets in large databases: The hierarchical partitioning approach
Tseng, Fan-Chen
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) : 1654 - 1661
[29] Mining maximal frequent itemsets for large scale transaction databases
Xia, R
Yuan, W
Ding, SC
Liu, J
Zhou, HB
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1480 - 1485
[30] Incremental mining of sequential patterns in large databases
Masseglia, F
Poncelet, P
Teisseire, M
[J]. DATA & KNOWLEDGE ENGINEERING, 2003, 46 (01) : 97 - 121

← 1 2 3 4 5 →