Clustering of large time series datasets

被引:22
|
作者
Aghabozorgi, Saeed [1 ]
Teh, Ying Wah [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia
关键词
Data mining; clustering; time series; large datasets; FAST SIMILARITY SEARCH; DIMENSIONALITY REDUCTION; AVERAGING METHOD; REPRESENTATION; RETRIEVAL; ALGORITHM;
D O I
10.3233/IDA-140669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to improve the accuracy of clustering of time series data. In the first step, time series data are clustered approximately. Then, in the second step, the built clusters are split into sub-clusters. Finally, sub-clusters are merged in the third step. In contrast to existing approaches, this method can generate accurate clusters based on similarity in shape in very large time series datasets. The accuracy of the proposed method is evaluated using various published datasets in different domains.
引用
收藏
页码:793 / 817
页数:25
相关论文
共 50 条
  • [1] FTSPlot: Fast Time Series Visualization for Large Datasets
    Riss, Michael
    [J]. PLOS ONE, 2014, 9 (04):
  • [2] Enhance Incremental Clustering for Time Series Datasets Using Distance Measures
    Khobragade, Sneha
    Mulay, Preeti
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS, ICICA 2016, 2018, 632 : 543 - 556
  • [3] Symbolic clustering of large datasets
    Lechevallier, Yves
    Verde, Rosanna
    de Carvalho, Francisco de A. T.
    [J]. DATA SCIENCE AND CLASSIFICATION, 2006, : 193 - +
  • [4] DDR: an index method for large time-series datasets
    An, JY
    Chen, YPP
    Chen, HX
    [J]. INFORMATION SYSTEMS, 2005, 30 (05) : 333 - 348
  • [5] An evolutionary approach for efficient prototyping of large time series datasets
    Leon-Alcaide, Pablo
    Rodriguez-Benitez, Luis
    Castillo-Herrera, Ester
    Moreno-Garcia, Juan
    Jimenez-Linares, Luis
    [J]. INFORMATION SCIENCES, 2020, 511 : 74 - 93
  • [6] k-Means-Lite: Real Time Clustering for Large Datasets
    Olukanmi, Peter O.
    Nelwamondo, Fulufhelo
    Marwala, Tshilidzi
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2018, : 54 - 59
  • [7] A new clustering algorithm for large datasets
    Li Qing-feng
    Peng Wen-feng
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2011, 18 (03): : 823 - 829
  • [8] Time series homogenisation of large observational datasets: impact of the number of partner series on efficiency
    Domonkos, Peter
    Coll, John
    [J]. CLIMATE RESEARCH, 2018, 74 (01) : 31 - 42
  • [9] Coevolutive clustering algorithm for large datasets
    Fabris, Fabio
    Luchi, Diego
    Varejao, Flavio Miguel
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [10] Spatial clustering of galaxies in large datasets
    Szalay, AS
    Budavari, T
    Connolly, A
    Gray, J
    Matsubara, T
    Pope, A
    Szapudi, I
    [J]. ASTRONOMICAL DATA ANALYSIS II, 2002, 4847 : 1 - 12