An Efficient Visual Assessment of Cluster Tendency Tool for Large-scale Time Series Data Sets

被引:0
|
作者
Iredale, Timothy B. [1 ]
Erfani, Sarah M. [1 ]
Leckie, Christopher [1 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data visualization has always been a vital tool to explore and understand underlying data structures and patterns. However, emerging technologies such as the Internet of Things (IoT) have enabled the collection of very large amounts of data over time. The sheer quantity of data available challenges existing time series visualisation methods. In this paper we present an introductory analysis of time series clustering with a focus on a novel shape-based measure of similarity, which is invariant under uniform time shift and uniform amplitude scaling. Based on this measure we develop a Visual Assessment of cluster Tendency (VAT) algorithm to assess large time series data sets and demonstrate its advantages in terms of complexity and propensity for implementation in a distributed computing environment. This algorithm is implemented as a cloud application using Spark where the run-time of the high complexity dissimilarity matrix calculations are reduced by up to 7.0 times in a 16 core computing cluster with even higher speed-up factors expected for larger computing clusters.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Scalable visual assessment of cluster tendency for large data sets
    Hathaway, Richard J.
    Bezdek, James C.
    Huband, Jacalyn M.
    [J]. PATTERN RECOGNITION, 2006, 39 (07) : 1315 - 1324
  • [2] bigVAT: Visual assessment of cluster tendency for large data sets
    Huband, JM
    Bezdek, JC
    Hathaway, RJ
    [J]. PATTERN RECOGNITION, 2005, 38 (11) : 1875 - 1886
  • [3] Developing a Visual Analytics Tool for Large-scale Proteomics Time-series Data
    Jenny Vuong
    Stolte, Christian
    Kaur, Sandeep
    O'Donoghue, Sean
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON BIG DATA VISUAL ANALYTICS (BDVA), 2016, : 68 - 69
  • [4] VAT: A tool for visual assessment of (cluster) tendency
    Bezdek, JC
    Hathaway, RJ
    [J]. PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 2225 - 2230
  • [5] Feature-aware forecasting of large-scale time series data sets
    Hartmann, Claudio
    Kegel, Lars
    Lehner, Wolfgang
    [J]. IT-INFORMATION TECHNOLOGY, 2020, 62 (3-4): : 157 - 168
  • [6] AstroCatR: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues
    Yu, Ce
    Li, Kun
    Tang, Shanjiang
    Sun, Chao
    Ma, Bin
    Zhao, Qing
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2020, 496 (01) : 629 - 637
  • [7] An Efficient NoSQL-Based Storage Schema for Large-Scale Time Series Data
    Ma, Ruizhe
    Zhou, Weiwei
    Ma, Zongmin
    [J]. JOURNAL OF DATABASE MANAGEMENT, 2024, 35 (01)
  • [8] Efficient Motif Discovery for Large-Scale Time Series in Healthcare
    Liu, Bo
    Li, Jianqiang
    Chen, Cheng
    Tan, Wei
    Chen, Qiang
    Zhou, MengChu
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2015, 11 (03) : 583 - 590
  • [9] Turbo: Efficient Communication Framework for Large-scale Data Processing Cluster
    Jia, Xuya
    Yao, Zhiyi
    Peng, Chao
    Zhao, Zihao
    Lei, Bin
    Liu, Edison
    Li, Xiang
    He, Zekun
    Wang, Yachen
    Zou, Xianneng
    Zhao, Chongqing
    Chu, Jinhui
    Wang, Jilong
    Miao, Congcong
    [J]. PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 540 - 553
  • [10] Comparative assessment of large-scale data sets of protein–protein interactions
    Christian von Mering
    Roland Krause
    Berend Snel
    Michael Cornell
    Stephen G. Oliver
    Stanley Fields
    Peer Bork
    [J]. Nature, 2002, 417 : 399 - 403