On efficiently summarizing categorical databases

被引：29

作者：

Wang, JY

Karypis, G ^{[1
]}

机构：

[1] Univ Minnesota, Digital Technol Ctr, Dept Comp Sci, Minneapolis, MN 55455 USA

[2] Univ Minnesota, Army HPC Res Ctr, Minneapolis, MN 55455 USA

[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2006年 / 9卷 / 01期

关键词：

data mining; frequent itemset; categorical database; clustering;

D O I：

10.1007/s10115-005-0216-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Frequent itemset mining was initially proposed and has been studied extensively in the context of association rule mining. In recent years, several studies have also extended its application to transaction or document clustering. However, most of the frequent itemset based clustering algorithms need to first mine a large intermediate set of frequent itemsets in order to identify a subset of the most promising ones that can be used for clustering. In this paper, we study how to directly find a subset of high quality frequent itemsets that can be used as a concise summary of the transaction database and to cluster the categorical data. By exploring key properties of the subset of itemsets that we are interested in, we proposed several search space pruning methods and designed an efficient algorithm called SUMMARY. Our empirical results show that SUMMARY runs very fast even when the minimum support is extremely low and scales very well with respect to the database size, and surprisingly, as a: pure frequent itemset mining algorithm it is very effective in clustering the categorical data and summarizing the dense transaction databases.

引用

下载

页码：19 / 37

页数：19

共 50 条

[31] Summarizing Association Patterns Efficiently by Using PI Tree in a Data Stream Environment
Lee, Guanling
Zhu, Yu-tang
Chen, Yi-Chun
JOURNAL OF INTERNET TECHNOLOGY, 2012, 13 (02): : 359 - 368
[32] Association rules with opposite items in large categorical databases
Wei, Q
Chen, GQ
FLEXIBLE QUERY ANSWERING SYSTEMS: RECENT ADVANCES, 2001, : 507 - 514
[33] Context-Based Similarity Measures for Categorical Databases
Das, Gautam
Mannila, Heikki
LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 201 - 210
[34] Geometrical codification for clustering mixed categorical and numerical databases
Fatima Barcelo-Rico
Jose-Luis Diez
Journal of Intelligent Information Systems, 2012, 39 : 167 - 185
[35] Geometrical codification for clustering mixed categorical and numerical databases
Barcelo-Rico, Fatima
Diez, Jose-Luis
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (01) : 167 - 185
[36] A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data
Chen, Guanhua
Ma, Xiuli
Yang, Dongqing
Tang, Shiwei
Meng Shuai
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 580 - +
[37] Efficiently repairing and measuring replica consistency in distributed databases
Garcia-Garcia, Javier
Ordonez, Carlos
Tosic, Predrag T.
DISTRIBUTED AND PARALLEL DATABASES, 2013, 31 (03) : 377 - 411
[38] Efficiently finding unusual shapes in large image databases
Wei, Li
Keogh, Eamonn
Xi, Xiaopeng
Yoder, Melissa
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 17 (03) : 343 - 376
[39] Efficiently finding unusual shapes in large image databases
Li Wei
Eamonn Keogh
Xiaopeng Xi
Melissa Yoder
Data Mining and Knowledge Discovery, 2008, 17 : 343 - 376
[40] Efficiently computing weighted proximity relationships in spatial databases
Lin, XM
Zhou, XM
Liu, CF
Zhou, XF
ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2001, 2118 : 279 - 290

← 1 2 3 4 5 →