Clustering of large time series datasets

被引：22

作者：

Aghabozorgi, Saeed ^{[1
]}

Teh, Ying Wah ^{[1
]}

机构：

[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur 50603, Malaysia

来源：

INTELLIGENT DATA ANALYSIS | 2014年 / 18卷 / 05期

关键词：

Data mining; clustering; time series; large datasets; FAST SIMILARITY SEARCH; DIMENSIONALITY REDUCTION; AVERAGING METHOD; REPRESENTATION; RETRIEVAL; ALGORITHM;

D O I：

10.3233/IDA-140669

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to improve the accuracy of clustering of time series data. In the first step, time series data are clustered approximately. Then, in the second step, the built clusters are split into sub-clusters. Finally, sub-clusters are merged in the third step. In contrast to existing approaches, this method can generate accurate clusters based on similarity in shape in very large time series datasets. The accuracy of the proposed method is evaluated using various published datasets in different domains.

引用

页码：793 / 817

页数：25

共 50 条

[1] FTSPlot: Fast Time Series Visualization for Large Datasets
Riss, Michael
[J]. PLOS ONE, 2014, 9 (04):
[2] Enhance Incremental Clustering for Time Series Datasets Using Distance Measures
Khobragade, Sneha
Mulay, Preeti
[J]. INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS, ICICA 2016, 2018, 632 : 543 - 556
[3] Symbolic clustering of large datasets
Lechevallier, Yves
Verde, Rosanna
de Carvalho, Francisco de A. T.
[J]. DATA SCIENCE AND CLASSIFICATION, 2006, : 193 - +
[4] DDR: an index method for large time-series datasets
An, JY
Chen, YPP
Chen, HX
[J]. INFORMATION SYSTEMS, 2005, 30 (05) : 333 - 348
[5] An evolutionary approach for efficient prototyping of large time series datasets
Leon-Alcaide, Pablo
Rodriguez-Benitez, Luis
Castillo-Herrera, Ester
Moreno-Garcia, Juan
Jimenez-Linares, Luis
[J]. INFORMATION SCIENCES, 2020, 511 : 74 - 93
[6] k-Means-Lite: Real Time Clustering for Large Datasets
Olukanmi, Peter O.
Nelwamondo, Fulufhelo
Marwala, Tshilidzi
[J]. 2018 5TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2018, : 54 - 59
[7] A new clustering algorithm for large datasets
Li Qing-feng
Peng Wen-feng
[J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2011, 18 (03): : 823 - 829
[8] Time series homogenisation of large observational datasets: impact of the number of partner series on efficiency
Domonkos, Peter
Coll, John
[J]. CLIMATE RESEARCH, 2018, 74 (01) : 31 - 42
[9] Coevolutive clustering algorithm for large datasets
Fabris, Fabio
Luchi, Diego
Varejao, Flavio Miguel
[J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
[10] Spatial clustering of galaxies in large datasets
Szalay, AS
Budavari, T
Connolly, A
Gray, J
Matsubara, T
Pope, A
Szapudi, I
[J]. ASTRONOMICAL DATA ANALYSIS II, 2002, 4847 : 1 - 12

← 1 2 3 4 5 →