Halite: Fast and Scalable Multiresolution Local-Correlation Clustering

被引:11
|
作者
Cordeiro, Robson L. F. [1 ]
Traina, Agma J. M. [1 ]
Faloutsos, Christos [2 ]
Traina, Caetano, Jr. [1 ]
机构
[1] Univ Sao Paulo, Dept Comp Sci, BR-13566590 Sao Carlos, SP, Brazil
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
巴西圣保罗研究基金会; 美国国家科学基金会;
关键词
Local-correlation clustering; moderate-to-high dimensional data; data mining;
D O I
10.1109/TKDE.2011.176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes Halite, a novel, fast, and scalable clustering method that looks for clusters in subspaces of multidimensional data. Existing methods are typically superlinear in space or execution time. Halite's strengths are that it is fast and scalable, while still giving highly accurate results. Specifically the main contributions of Halite are: 1) Scalability: it is linear or quasi linear in time and space regarding the data size and dimensionality, and the dimensionality of the clusters' subspaces; 2) Usability: it is deterministic, robust to noise, doesn't take the number of clusters as an input parameter, and detects clusters in subspaces generated by original axes or by their linear combinations, including space rotation; 3) Effectiveness: it is accurate, providing results with equal or better quality compared to top related works; and 4) Generality: it includes a soft clustering approach. Experiments on synthetic data ranging from five to 30 axes and up to 1 million points were performed. Halite was in average at least 12 times faster than seven representative works, and always presented highly accurate results. On real data, Halite was at least 11 times faster than others, increasing their accuracy in up to 35 percent. Finally, we report experiments in a real scenario where soft clustering is desirable.
引用
收藏
页码:387 / 401
页数:15
相关论文
共 50 条
  • [1] Bayer pattern demosaicking using local-correlation approach
    Lukac, R
    Plataniotis, KN
    Venetsanopoulos, AN
    COMPUTATIONAL SCIENCE - ICCS 2004, PROCEEDINGS, 2004, 3039 : 26 - 33
  • [2] Super-resolution PIV by Recursive Local-correlation
    Hart D.P.
    Journal of Visualization, 2000, 3 (2) : 187 - 194
  • [3] Bringing aerospace images into coincidence with subpixel accuracy by the local-correlation method
    Potapov, AS
    Malyshev, IA
    Lutsiv, VR
    JOURNAL OF OPTICAL TECHNOLOGY, 2004, 71 (05) : 293 - 297
  • [4] LOCAL-CORRELATION BASED ZERO-EQUATION TRANSITION MODEL FOR TURBOMACHINERY
    Sandhu, Jatinder Pal Singh
    PROCEEDINGS OF THE ASME GAS TURBINE INDIA CONFERENCE, 2019, VOL 1, 2020,
  • [5] Fast and Scalable Local Kernel Machines
    Segata, Nicola
    Blanzieri, Enrico
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1883 - 1926
  • [6] Correlation Clustering with Local Objectives
    Kalhan, Sanchit
    Makarychev, Konstantin
    Zhou, Timothy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] How mutual geometrical distortions affect the possibility of spatially combining images by the local-correlation method
    Potapov, AS
    JOURNAL OF OPTICAL TECHNOLOGY, 2004, 71 (08) : 555 - 560
  • [8] A scalable and fast OPTICS for clustering trajectory big data
    Ze Deng
    Yangyang Hu
    Mao Zhu
    Xiaohui Huang
    Bo Du
    Cluster Computing, 2015, 18 : 549 - 562
  • [9] Scalable Fast Evolutionary k-means Clustering
    de Oliveira, Gilberto Viana
    Naldi, Murilo Coelho
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 74 - 79
  • [10] A scalable and fast OPTICS for clustering trajectory big data
    Deng, Ze
    Hu, Yangyang
    Zhu, Mao
    Huang, Xiaohui
    Du, Bo
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (02): : 549 - 562