On efficiently summarizing categorical databases

被引：0

作者：

Jianyong Wang

George Karypis

机构：

[1] Tsinghua University,Department of Computer Science and Technology

[2] University of Minnesota,Department of Computer Science, Digital Technology Center and Army HPC Research Center

来源：

Knowledge and Information Systems | 2006年 / 9卷

关键词：

Data mining; Frequent itemset; Categorical database; Clustering;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Frequent itemset mining was initially proposed and has been studied extensively in the context of association rule mining. In recent years, several studies have also extended its application to transaction or document clustering. However, most of the frequent itemset based clustering algorithms need to first mine a large intermediate set of frequent itemsets in order to identify a subset of the most promising ones that can be used for clustering. In this paper, we study how to directly find a subset of high quality frequent itemsets that can be used as a concise summary of the transaction database and to cluster the categorical data. By exploring key properties of the subset of itemsets that we are interested in, we proposed several search space pruning methods and designed an efficient algorithm called SUMMARY. Our empirical results show that SUMMARY runs very fast even when the minimum support is extremely low and scales very well with respect to the database size, and surprisingly, as a pure frequent itemset mining algorithm it is very effective in clustering the categorical data and summarizing the dense transaction databases.

引用

页码：19 / 37

页数：18

共 50 条

[1] On efficiently summarizing categorical databases
Wang, JY
Karypis, G
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 19 - 37
[2] Summarizing Relational Databases
Yang, Xiaoyan
Procopiuc, Cecilia M.
Srivastava, Divesh
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (01):
[3] Summarizing categorical data by clustering attributes
Mampaey, Michael
Vreeken, Jilles
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 130 - 173
[4] Summarizing categorical data by clustering attributes
Michael Mampaey
Jilles Vreeken
[J]. Data Mining and Knowledge Discovery, 2013, 26 : 130 - 173
[5] On Efficiently Summarizing a Large Dynamic Graph
Khan, Kifayat Ullah
Rasel, Mostofa Kamal
Noorulamin, Muhammad
Nawaz, Waqas
Lee, Young-Koo
[J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 345 - 348
[6] Efficiently summarizing attributed diffusion networks
Sorour E. Amiri
Liangzhe Chen
B. Aditya Prakash
[J]. Data Mining and Knowledge Discovery, 2018, 32 : 1251 - 1274
[7] SUMMARY: Efficiently summarizing transactions for clustering
Wang, JY
Karypis, G
[J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 241 - 248
[8] Efficiently summarizing attributed diffusion networks
Amiri, Sorour E.
Chen, Liangzhe
Prakash, B. Aditya
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (05) : 1251 - 1274
[9] Summarizing transactional databases with overlapped hyperrectangles
Xiang, Yang
Jin, Ruoming
Fuhry, David
Dragan, Feodor F.
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 23 (02) : 215 - 251
[10] Summarizing transactional databases with overlapped hyperrectangles
Yang Xiang
Ruoming Jin
David Fuhry
Feodor F. Dragan
[J]. Data Mining and Knowledge Discovery, 2011, 23 : 215 - 251

← 1 2 3 4 5 →