SVStream: A Support Vector-Based Algorithm for Clustering Data Streams

被引:49
|
作者
Wang, Chang-Dong [1 ]
Lai, Jian-Huang [1 ]
Huang, Dong [1 ]
Zheng, Wei-Shi [1 ]
机构
[1] Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou Higher Educ Mega Ctr, Guangzhou 510006, Guangdong, Peoples R China
关键词
Data stream clustering; support vector; clusters of arbitrary shape; overlapping; evolving; noise; CLASSIFIER; ROBUST;
D O I
10.1109/TKDE.2011.263
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel data stream clustering algorithm, termed SVStream, which is based on support vector domain description and support vector clustering. In the proposed algorithm, the data elements of a stream are mapped into a kernel space, and the support vectors are used as the summary information of the historical elements to construct cluster boundaries of arbitrary shape. To adapt to both dramatic and gradual changes, multiple spheres are dynamically maintained, each describing the corresponding data domain presented in the data stream. By allowing for bounded support vectors (BSVs), the proposed SVStream algorithm is capable of identifying overlapping clusters. A BSV decaying mechanism is designed to automatically detect and remove outliers (noise). We perform experiments over synthetic and real data streams, with the overlapping, evolving, and noise situations taken into consideration. Comparison results with state-of-the-art data stream clustering methods demonstrate the effectiveness and efficiency of the proposed method.
引用
收藏
页码:1410 / 1424
页数:15
相关论文
共 50 条
  • [1] A DATA STREAMS CLUSTERING ALGORITHM BASED ON INTERVAL DATA
    Li, Yan
    Ye, Ming
    Wang, Huiwen
    Liu, Dan
    Che, Yin
    [J]. PROCEEDINGS OF THE 38TH INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2008, : 2775 - 2778
  • [2] Data reducing algorithm of support vector machine based on fuzzy kernel clustering
    Wang, Fang
    Yang, Hui-Zhong
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2007, 28 (SUPPL. 1): : 185 - 188
  • [3] A Support Vector and K-Means Based Hybrid Intelligent Data Clustering Algorithm
    Sun, Liang
    Yoshida, Shinichi
    Liang, Yanchun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11) : 2234 - 2243
  • [4] Vector-based swarm optimization algorithm
    Afroomand, Amir
    Tavakoli, Saeed
    [J]. APPLIED SOFT COMPUTING, 2015, 37 : 911 - 922
  • [5] Clustering Algorithm for Multiple Data Streams Based on Data Cloud Node
    Li, Sa
    Shao, Liangshan
    [J]. PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 247 - 250
  • [6] On data based learning using support vector clustering
    Ribeiro, B
    [J]. ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2516 - 2521
  • [7] Fast clustering-based anonymization algorithm for data streams
    Guo, Kun
    Zhang, Qi-Shan
    [J]. Ruan Jian Xue Bao/Journal of Software, 2013, 24 (08): : 1852 - 1867
  • [8] Dynamic Algorithm based on split and merge for Data Streams Clustering
    Ounali, Chedi
    Ben Rejab, Fahmi
    Nouira, Kaouther
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2018, 13 (04): : 137 - 148
  • [10] An Outlier Detection Algorithm for Data Streams Based on Fuzzy Clustering
    Su, Xiaoke
    Qin, Yuming
    Wan, Renxia
    [J]. PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 109 - 112