Clustering very large databases using EM mixture models

被引：0

作者：

Bradley, PS

Fayyad, UM

Reina, CA

机构：

来源：

15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS | 2000年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering very large databases is a challenge for traditional pattern recognition algorithms, e.g. the Expectation-Maximization (EM) algorithm for fitting mixture models, because of high memory and iteration requirements. Over large databases, the cost of the numerous scans required to converge and large memory requirements of the algorithm becomes prohibitive. We present a decomposition of the EM algorithm requiring a small amount of memory by limiting iterations to small data subsets. The scalable EM approach requires at most one database scan and is based on identifying regions of the data that are discardable, regions that are compressible, and regions that must be maintained in memory. Data resolution is preserved to the extent possible based upon the size of the memory buffer and fit of the current model to the data. Computational tests demonstrate that the scalable scheme outperforms similarly constrained EM approaches.

引用

页码：76 / 80

页数：3

共 50 条

[1] Hybridized Fragmentation of Very Large Databases Using Clustering
Harikumar, Sandhya
Ramachandran, Raji
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
[2] Clustering and validation for very large databases (VLDB)
Momin, Bashirahamad Fardin
[J]. 2006 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2007, : 258 - 263
[3] Reinforced EM Algorithm for Clustering with Gaussian Mixture Models
Tobin, Joshua
Ho, Chin Pang
Zhang, Mimi
[J]. PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 118 - 126
[4] A robust EM clustering algorithm for Gaussian mixture models
Yang, Miin-Shen
Lai, Chien-Yo
Lin, Chih-Ying
[J]. PATTERN RECOGNITION, 2012, 45 (11) : 3950 - 3961
[5] Short documents clustering in very large text databases
Wang, Yongheng
Jia, Yan
Yang, Shuqiang
[J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 83 - 93
[6] Clustering in very large databases based on distance and density
Qian, WN
Gong, XQ
Zhou, AY
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (01): : 67 - 76
[7] Clustering in very large databases based on distance and density
Weining Qian
XueQing Gong
AoYing Zhou
[J]. Journal of Computer Science and Technology, 2003, 18 : 67 - 76
[8] Learning a Mixture of Sparse Models by EM Algorithm for Object Clustering
Fang, Yuhan
Jiang, Ruojing
Li, Chenguang
[J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 594 - 597
[9] A Variational EM Acceleration for Efficient Clustering at Very Large Scales
Hirschberger, Florian
Forster, Dennis
Luecke, Joerg
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9787 - 9801
[10] Very fast EM-based mixture model clustering using multiresolution kd-trees
Moore, AW
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 543 - 549

← 1 2 3 4 5 →