A Hierarchical Algorithm for Extreme Clustering

被引:50
|
作者
Kobren, Ari [1 ]
Monath, Nicholas [1 ]
Krishnamurthy, Akshay [1 ]
McCallum, Andrew [1 ]
机构
[1] Univ Massachusetts Amherst, Coll Informat & Comp Sci, Amherst, MA 01002 USA
基金
美国国家科学基金会;
关键词
Clustering; Large-scale learning;
D O I
10.1145/3097983.3098079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many modern clustering methods scale well to a large number of data points, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy, incremental algorithm for hierarchical clustering that scales to both massive N and K-a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated by the desire for both accuracy and speed, our approach performs tree rotations for the sake of enhancing subtree purity and encouraging balancedness. We prove that, under a natural separability assumption, our non-greedy algorithm will produce trees with perfect dendrogram purity regardless of data arrival order. Our experiments demonstrate that PERCH constructs more accurate trees than other tree-building clustering algorithms and scales well with both N and K, achieving a higher quality clustering than the strongest flat clustering competitor in nearly half the time.
引用
收藏
页码:255 / 264
页数:10
相关论文
共 50 条
  • [1] Hierarchical++: improving the hierarchical clustering algorithm
    Pinheiro, Wallace Anacleto
    Pinheiro, Ana Barbara Sapienza
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2023, 15 (03) : 223 - 239
  • [2] A New Hierarchical Clustering Algorithm
    Starczewski, Artur
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 175 - 180
  • [3] Hierarchical division algorithm of clustering
    Dvoenko, S.D.
    [J]. Avtomatika i Telemekhanika, 1999, (04): : 117 - 124
  • [4] Document clustering with hierarchical algorithm
    Wang, Y
    Hodges, J
    [J]. Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1614 - 1617
  • [5] A New Hierarchical Clustering Algorithm
    Nazari, Zahra
    Kang, Dongshik
    Asharif, M. Reza
    Sung, Yulwan
    Ogawa, Seiji
    [J]. 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 148 - 152
  • [6] Extreme Anomalous Score Clustering Algorithm
    Lisuwan, Panuruk
    Boonserm, Petarpa
    Sinapiromsaran, Krung
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 66 - 70
  • [7] Hierarchical link clustering algorithm in networks
    Bodlaj, Jernej
    Batagelj, Vladimir
    [J]. PHYSICAL REVIEW E, 2015, 91 (06)
  • [8] Hierarchical clustering algorithm based on granularity
    Liang, Jiuzhen
    Li, Guangbin
    [J]. GRC: 2007 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, PROCEEDINGS, 2007, : 429 - 432
  • [9] A TRANSFER ALGORITHM FOR HIERARCHICAL-CLUSTERING
    SCHADER, M
    [J]. MATHEMATICAL SOCIAL SCIENCES, 1982, 2 (02) : 189 - 197
  • [10] An adaptive parallel hierarchical clustering algorithm
    Li, Zhaopeng
    Li, Kenli
    Xiao, Degui
    Yang, Lei
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 97 - 107