Clustering Distributed Short Time Series with Dense Patterns

被引:3
|
作者
da Silva, Josenildo C. [1 ]
Oliveira, Gustavo H. B. S. [1 ]
Lodi, Stefano [2 ]
Klusch, Matthias [3 ]
机构
[1] Inst Fed Maranhao IFMA, Dept Comp, Ave Getulio Vargas 04, BR-65030005 Sao Luis, MA, Brazil
[2] Dipartimento Informat Sci & Ingn, Viale Risorgimento 2, Bologna, Italy
[3] DFKI GmbH, Stuhlsatzenhausweg 3,Campus D3-2, D-66123 Saarbrucken, Germany
关键词
Time series clustering; short time series; distributed data clustering; LOGISTIC-REGRESSION; COMPUTATION; ALGORITHM; SELECTION; TOOL;
D O I
10.1109/ICMLA.2017.0-181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The clustering of genes with similar temporal profiles is an important task in gene expression data analysis. Current approaches to the clustering of sparse gene expression data with temporal information suffer from their at least quadratic complexity in the number of clusters, the number of genes, or both, and are not distributed. In this paper, we present the first distributed and density-based approach to short time series clustering, called DTSCluster, which is suitable for gene expression data. DTSCluster identifies dense patterns in the distributed datasets and uses them to generate the time series clusters. The comparative experimental results revealed that DTSCluster is scalable in the dataset size with linear complexity in time and space, and outperforms other representative approaches in terms of cluster validation with the silhouette index as well. The distributed scenario also opens up the opportunity for collaborative data mining between different gene expression data holders.
引用
收藏
页码:34 / 41
页数:8
相关论文
共 50 条
  • [1] Fuzzy clustering of short time-series and unevenly distributed sampling points
    Möller-Levet, CS
    Klawonn, F
    Cho, KH
    Wolkenhauer, O
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 330 - 340
  • [2] Clustering Distributed Time Series in Sensor Networks
    Yin, Jie
    Gaber, Mohamed Medhat
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 678 - 687
  • [3] Clustering short time-series microarray
    Ping, Loh Wei
    Abu Hasan, Yahya
    [J]. INTERNATIONAL CONFERENCE ON MATHEMATICAL BIOLOGY 2007, 2008, 971 : 39 - 46
  • [4] Distributed recognition of patterns in time series data
    Morrill, J
    [J]. COMMUNICATIONS OF THE ACM, 1998, 41 (05) : 45 - 51
  • [5] Clustering short time series gene expression data
    Ernst, J
    Nau, GJ
    Bar-Joseph, Z
    [J]. BIOINFORMATICS, 2005, 21 : I159 - I168
  • [6] Frame potential minimization for clustering short time series
    Springer, Tobias
    Ickstadt, Katja
    Stoeckler, Joachim
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (04) : 341 - 355
  • [7] Frame potential minimization for clustering short time series
    Tobias Springer
    Katja Ickstadt
    Joachim Stöckler
    [J]. Advances in Data Analysis and Classification, 2011, 5 : 341 - 355
  • [8] Fuzzy Clustering for Incomplete Short Time Series Data
    Cruz, Lucia P.
    Vieira, Susana M.
    Vinga, Susana
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE-BK, 2015, 9273 : 353 - 359
  • [9] TS-DENSE: Time Series Data Augmentation by Subclass Clustering
    Zanella, Rodrigo H.
    de Castro Coelho, Lucas A.
    Souza, Vinicius M. A.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1800 - 1806
  • [10] Investigating Water Consumption Patterns Through Time Series Clustering
    Abu Waraga, Omnia
    Abdeljaber, Abdulrahman
    Abu Talib, Manar
    Abdallah, Mohamed
    [J]. 2021 14TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE), 2021, : 44 - 49