Incremental Density-Based Clustering on Multicore Processors

被引:9
|
作者
Mai, Son T. [1 ]
Jacobsen, Jon [5 ]
Amer-Yahia, Sihem [3 ]
Spence, Ivor [2 ]
Nhat-Phuong Tran [1 ]
Assent, Ira [6 ]
Quoc Viet Hung Nguyen [4 ]
机构
[1] Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland
[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Artificial Intelligence Res Theme, Belfast BT7 1NN, Antrim, North Ireland
[3] Univ Grenoble Alpes, F-38400 St Martin Dheres, France
[4] Griffith Univ, Brisbane, Qld 4222, Australia
[5] Aarhus Univ, DK-8000 Aarhus, Denmark
[6] Aarhus Univ, Comp Sci, DK-8000 Aarhus, Denmark
关键词
Clustering algorithms; Multicore processing; Databases; Instruction sets; Electronic mail; Time factors; Clustering methods; Density-based clustering; anytime clustering; incremental clustering; active clustering; multicore CPUs; ALGORITHM; DBSCAN;
D O I
10.1109/TPAMI.2020.3023125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The density-based clustering algorithm is a fundamental data clustering technique with many real-world applications. However, when the database is frequently changed, how to effectively update clustering results rather than reclustering from scratch remains a challenging task. In this work, we introduce IncAnyDBC, a unique parallel incremental data clustering approach to deal with this problem. First, IncAnyDBC can process changes in bulks rather than batches like state-of-the-art methods for reducing update overheads. Second, it keeps an underlying cluster structure called the object node graph during the clustering process and uses it as a basis for incrementally updating clusters wrt. inserted or deleted objects in the database by propagating changes around affected nodes only. In additional, IncAnyDBC actively and iteratively examines the graph and chooses only a small set of most meaningful objects to produce exact clustering results of DBSCAN or to approximate results under arbitrary time constraints. This makes it more efficient than other existing methods. Third, by processing objects in blocks, IncAnyDBC can be efficiently parallelized on multicore CPUs, thus creating a work-efficient method. It runs much faster than existing techniques using one thread while still scaling well with multiple threads. Experiments are conducted on various large real datasets for demonstrating the performance of IncAnyDBC.
引用
收藏
页码:1338 / 1356
页数:19
相关论文
共 50 条
  • [1] A Fuzzy Density-based Incremental Clustering Algorithm
    Laohakiat, Sirisup
    Ratanajaipan, Photchanan
    Navaravong, Leenhapat
    Ungrangsi, Rachanee
    Maleewong, Krissada
    [J]. 2018 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2018, : 211 - 215
  • [2] An Incremental Density-Based Clustering Technique for Large Datasets
    Rehman, Saif Ur
    Khan, Muhammed Naeem Ahmed
    [J]. COMPUTATIONAL INTELLIGENCE IN SECURITY FOR INFORMATION SYSTEMS 2010, 2010, 85 : 3 - 11
  • [3] Incremental Shared Nearest Neighbor Density-Based Clustering
    Singh, Sumeet
    Awekar, Amit
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1533 - 1536
  • [4] An incremental density-based clustering framework using fuzzy local clustering
    Laohakiat, Sirisup
    Sa-ing, Vera
    [J]. INFORMATION SCIENCES, 2021, 547 : 404 - 426
  • [5] An efficient automated incremental density-based algorithm for clustering and classification
    Azhir, Elham
    Navimipour, Nima Jafari
    Hosseinzadeh, Mehdi
    Sharifi, Arash
    Darwesh, Aso
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 665 - 678
  • [6] Efficient incremental density-based algorithm for clustering large datasets
    Bakr, Ahmad M.
    Ghanem, Nagia M.
    Ismail, Mohamed A.
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2015, 54 (04) : 1147 - 1154
  • [7] DeltaDens - Incremental Algorithm for On-Line Density-Based Clustering
    Ziembinski, Radoslaw Z.
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 163 - 172
  • [8] M-FDBSCAN: A multicore density-based uncertain data clustering algorithm
    Erdem, Atakan
    Gundem, Taflan Imre
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (01) : 143 - 154
  • [9] Density-based clustering
    Campello, Ricardo J. G. B.
    Kroeger, Peer
    Sander, Jorg
    Zimek, Arthur
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (02)
  • [10] Density-based clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Sander, Joerg
    Zimek, Arthur
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (03) : 231 - 240