Efficient disk-based K-means clustering for relational databases

被引:50
|
作者
Ordonez, C
Omiecinski, E
机构
[1] Teradata, Rancho Bernardo, CA 92127 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
clustering; K-means; relational databases; disk;
D O I
10.1109/TKDE.2004.25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
K-means is one of the most popular clustering algorithms. This article introduces an efficient disk-based implementation of K-means. The proposed algorithm is designed to work inside a relational database management system. It can cluster large data sets having very high dimensionality. In general, it only requires three scans over the data set. It is optimized to perform heavy disk I/O and its memory requirements are low. Its parameters are easy to set. An extensive experimental section evaluates quality of results and performance. The proposed algorithm is compared against the Standard K-means algorithm as well as the Scalable K-means algorithm.
引用
收藏
页码:909 / 921
页数:13
相关论文
共 50 条
  • [1] k-means clustering and kNN classification based on negative databases
    Zhao, Dongdong
    Hu, Xiaoyi
    Xiong, Shengwu
    Tian, Jing
    Xiang, Jianwen
    Zhou, Jing
    Li, Huanhuan
    [J]. APPLIED SOFT COMPUTING, 2021, 110
  • [2] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [3] An Efficient Character Recognition Scheme Based on K-Means Clustering
    Pourmohammad, Sajjad
    Soosahabi, Reza
    Maida, Anthony S.
    [J]. 2013 5TH INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND APPLIED OPTIMIZATION (ICMSAO), 2013,
  • [4] An Efficient Hierarchy-Based of K-Means Clustering Algorithm
    Li Yong-peng
    Zhang Bo-tao
    Zhang Shuai-qin
    [J]. 2008 INTERNATIONAL WORKSHOP ON INFORMATION TECHNOLOGY AND SECURITY, 2008, : 106 - 110
  • [5] An efficient K-means clustering algorithm based on influence factors
    Leng, Mingwei
    Tang, Haitao
    Chen, Xiaoyun
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 815 - +
  • [6] DISK-BASED DATABASES - TIME FOR REFLECTION
    JOYCE, J
    [J]. DATA PROCESSING, 1979, 21 (04): : 32 - 34
  • [7] GK-means: An Efficient K-means Clustering Algorithm Based On Grid
    Chen, Xiaoyun
    Su, Youli
    Chen, Yi
    Liu, Guohua
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2009), VOLUMES 1 AND 2, 2009, : 531 - 534
  • [8] Transformer Autoencoder for K-means Efficient clustering
    Wu, Wenhao
    Wang, Weiwei
    Jia, Xixi
    Feng, Xiangchu
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [9] Efficient online spherical K-means clustering
    Zhong, S
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3180 - 3185
  • [10] Efficient enhanced k-means clustering algorithm
    Fahim A.M.
    Salem A.M.
    Torkey F.A.
    Ramadan M.A.
    [J]. Journal of Zhejiang University-SCIENCE A, 2006, 7 (10): : 1626 - 1633