Cost-effective and adaptive clustering algorithm for stream processing on cloud system

被引:0
|
作者
Yue Xia
Junhua Fang
Pingfu Chao
Zhicheng Pan
Jedi S. Shang
机构
[1] Soochow University,School of Computer Science and Technology
[2] The University of Queensland,School of Information Technology and Electrical Engineering
[3] Thinvent Technology Co. LTD.,undefined
来源
GeoInformatica | 2023年 / 27卷
关键词
Real-time processing; Density-based clustering; Window model; Time interval; Cluster evolution;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering is a fundamental operation that plays an essential role in data management and analysis. Clustering algorithms have been well studied over the past two decades, but the real-time clustering has yet to be maturely applied. For applications based on clustering calculations, capturing the dynamic changes of clusters and trends of moving objects in a real-time manner can maximize the value of the data. Although the DSPE (D istributed S tream P rocessing E ngine) is capable of such workloads, it still faces the problems of fixed window size and computational resources waste. In this paper, we introduce a new C ost-e ffective and A daptive C lustering method (CeAC), which can improve computational efficiency while ensuring the accuracy of the clustering result. Specifically, we design a composite window model which contains the latest data records and maintains historical states. To achieve a lightweight clustering, we propose a fully online clustering algorithm based on grid density, which can capture clusters with arbitrary shape and effectively handle outliers in parallel. We further introduce an adaptive calculation model to accelerate the clustering operation by shedding workload according to the incoming data characteristic. Experimental results show that the proposed method is accurate and efficient in real-time data stream clustering.
引用
收藏
页码:1 / 21
页数:20
相关论文
共 50 条
  • [1] Cost-effective and adaptive clustering algorithm for stream processing on cloud system
    Xia, Yue
    Fang, Junhua
    Chao, Pingfu
    Pan, Zhicheng
    Shang, Jedi S.
    GEOINFORMATICA, 2023, 27 (01) : 1 - 21
  • [2] Cost-Effective Stream Join Algorithm on Cloud System
    Fang, Junhua
    Zhang, Rong
    Wang, Xiaotong
    Fu, Tom Z. J.
    Zhang, Zhenjie
    Zhou, Aoying
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1773 - 1782
  • [3] Cost-Effective Data Partition for Distributed Stream Processing System
    Wang, Xiaotong
    Fang, Junhua
    Li, Yuming
    Zhang, Rong
    Zhou, Aoying
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 623 - 635
  • [4] Cost-effective clustering
    Gottlieb, S
    COMPUTER PHYSICS COMMUNICATIONS, 2001, 142 (1-3) : 43 - 48
  • [5] A cost-effective strategy for Cloud system maintenance
    Li, Xinyi
    Qi, Yong
    Chen, Pengfei
    Fan, Yang
    COMPUTERS & ELECTRICAL ENGINEERING, 2017, 58 : 176 - 189
  • [6] Cutting the Unnecessary Long Tail: Cost-Effective Big Data Clustering in the Cloud
    Li, Dongwei
    Wang, Shuliang
    Gao, Nan
    He, Qiang
    Yang, Yun
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (01) : 292 - 303
  • [7] Cost-Effective Peak Shaving Strategy Based on Clustering and XGBoost Algorithm
    Lim, Sol
    Gantassi, Rahma
    Choi, Yonghoon
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 757 - 761
  • [8] A cost-effective and reliable cloud storage
    Wei, Yongmei
    Foo, Yong Wee
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 938 - 939
  • [9] A cost-effective scheme supporting adaptive service migration in cloud data center
    Bing Yu
    Yanni Han
    Hanning Yuan
    Xu Zhou
    Zhen Xu
    Frontiers of Computer Science, 2015, 9 : 875 - 886
  • [10] Cost-Effective, Workload-Adaptive Migration of Big Data Applications to the Cloud
    Giannakouris, Victor
    Fernandez, Alejandro
    Simitsis, Alkis
    Babu, Shivnath
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1909 - 1912