Halite: Fast and Scalable Multiresolution Local-Correlation Clustering

被引:11
|
作者
Cordeiro, Robson L. F. [1 ]
Traina, Agma J. M. [1 ]
Faloutsos, Christos [2 ]
Traina, Caetano, Jr. [1 ]
机构
[1] Univ Sao Paulo, Dept Comp Sci, BR-13566590 Sao Carlos, SP, Brazil
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
巴西圣保罗研究基金会; 美国国家科学基金会;
关键词
Local-correlation clustering; moderate-to-high dimensional data; data mining;
D O I
10.1109/TKDE.2011.176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes Halite, a novel, fast, and scalable clustering method that looks for clusters in subspaces of multidimensional data. Existing methods are typically superlinear in space or execution time. Halite's strengths are that it is fast and scalable, while still giving highly accurate results. Specifically the main contributions of Halite are: 1) Scalability: it is linear or quasi linear in time and space regarding the data size and dimensionality, and the dimensionality of the clusters' subspaces; 2) Usability: it is deterministic, robust to noise, doesn't take the number of clusters as an input parameter, and detects clusters in subspaces generated by original axes or by their linear combinations, including space rotation; 3) Effectiveness: it is accurate, providing results with equal or better quality compared to top related works; and 4) Generality: it includes a soft clustering approach. Experiments on synthetic data ranging from five to 30 axes and up to 1 million points were performed. Halite was in average at least 12 times faster than seven representative works, and always presented highly accurate results. On real data, Halite was at least 11 times faster than others, increasing their accuracy in up to 35 percent. Finally, we report experiments in a real scenario where soft clustering is desirable.
引用
收藏
页码:387 / 401
页数:15
相关论文
共 50 条
  • [21] Fair Correlation Clustering with Global and Local Guarantees
    Friggstad, Zachary
    Mousavi, Ramin
    ALGORITHMS AND DATA STRUCTURES, WADS 2021, 2021, 12808 : 414 - 427
  • [22] Local Correlation Clustering with Asymmetric Classification Errors
    Jafarov, Jafar
    Kalhan, Sanchit
    Makarychev, Konstantin
    Makarychev, Yury
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [23] Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility
    Kumar, Dheeraj
    Wu, Huayu
    Rajasegarar, Sutharshan
    Leckie, Christopher
    Krishnaswamy, Shonali
    Palaniswami, Marimuthu
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (11) : 3709 - 3722
  • [24] Fast and Scalable Protein Motif Sequence Clustering based on Hadoop Framework
    Farhangi, Erfan
    Ghadiri, Nasser
    Asadi, Mahsa
    Nikbakht, Mohammad Amin
    Pitre, Sylvain
    2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2017, : 24 - 31
  • [25] PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria
    Bayliss, Sion C.
    Thorpe, Harry A.
    Coyle, Nicola M.
    Sheppard, Samuel K.
    Feil, Edward J.
    GIGASCIENCE, 2019, 8 (10):
  • [26] X-DMM: Fast and Scalable Model Based Text Clustering
    Li, Linwei
    Guo, Liangchen
    He, Zhenying
    Jing, Yinan
    Wang, X. Sean
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4197 - 4204
  • [27] Fast Combinatorial Algorithms for Min Max Correlation Clustering
    Davies, Sami
    Moseley, Benjamin
    Newman, Heather
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [28] Scalable fast benchmarking for individual quantum gates with local twirling
    Zhang, Yihong
    Yu, Wenjun
    Zeng, Pei
    Liu, Guoding
    Ma, Xiongfeng
    PHOTONICS RESEARCH, 2023, 11 (01) : 81 - 99
  • [29] SOFF: Scalable and Oriented FAST-Based Local Features
    Bouhlel, Noura
    Ben Ammar, Anis
    Ksibi, Amel
    Ben Amar, Chokri
    NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [30] Scalable fast benchmarking for individual quantum gates with local twirling
    YIHONG ZHANG
    WENJUN YU
    PEI ZENG
    GUODING LIU
    XIONGFENG MA
    Photonics Research, 2023, 11 (01) : 81 - 99