Time Series Symmetric Pattern Mining

被引:0
|
作者
Li P.-P. [1 ]
Song S.-X. [1 ,2 ,3 ]
Wang J.-M. [1 ,2 ,3 ]
机构
[1] School of Software, Tsinghua University, Beijing
[2] National Engineering Laboratory for Big Data Software, Tsinghua University, Beijing
[3] Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 03期
关键词
Distance measurement; Dynamic programming; Symmetric pattern; Time series;
D O I
10.13328/j.cnki.jos.006453
中图分类号
学科分类号
摘要
With the integration of informatization and industrialization, the Internet of Things and industrial Internet have flourished, resulting in a large amount of industrial big data represented by time series. There are many valuable patterns in time series, among which symmetric patterns are widespread in various time series. Mining symmetric patterns has important research value in the fields of behavior analysis, trajectory tracking, anomaly detection, etc. However, the data volume of time series is often as high as tens or even hundreds of gigabytes. It can take months or even years to mine symmetric patterns using a direct nested query algorithm, and typical acceleration techniques such as indexing, lower bounds, and triangular inequalities can only produce speedup of one or two orders of magnitude at most. Therefore, based on the inspiration of the dynamic time warping algorithm, this study proposes a method that can mine all the symmetric patterns of the time series within the time complexity of O(w×|T|). Specifically, given the symmetric pattern length constraint, the symmetric subsequences can be calculated based on the interval dynamic programming. Then the largest number of non-overlapping symmetric patterns can be selected according to the greedy strategy. In addition, we also study the algorithm for mining symmetric patterns in the time series data stream, and dynamically adjusts the window size according to the characteristics of the data in the window to ensure the integrity of the symmetric pattern data. Using one artificial data set and three real data sets to experiment with the above method under different data volumes, it can be seen from the experimental results that compared with other symmetric pattern mining methods, this method has better performance in terms of pattern mining results and time overhead. © Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:968 / 984
页数:16
相关论文
共 22 条
  • [1] Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB., Exact discovery of time series motifs, Proc. of the SDM, pp. 473-484, (2009)
  • [2] Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding YF, Dau HA, Silva DF, Mueen A, Keogh EJ., Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets, Proc. of the ICDM, pp. 1317-1322, (2016)
  • [3] Ding H, Trajcevski G, Scheuermann P, Wang XY, Keogh EJ., Querying and mining of time series data: Experimental comparison of representations and distance measures, Proc. of the VLDB Endow, 1, 2, pp. 1542-1552, (2008)
  • [4] Rakthanmanon T, Campana BJL, Mueen A, Batista GEAPA, Westover MB, Zhu Q, Zakaria J, Keogh EJ., Searching and mining trillions of time series subsequences under dynamic time warping, Proc. of the KDD, pp. 262-270, (2012)
  • [5] Song SX, Zhang AQ, Wang JM, Yu PS., SCREEN: Stream data cleaning under speed constraints, Proc. of the SIGMOD Conf, pp. 827-841, (2015)
  • [6] Gao F, Song SX, Wang JM., Time series data cleaning under multi-speed constraints, Ruan Jian Xue Bao/Journal of Software, 32, 3, pp. 689-711, (2021)
  • [7] Hishinuma T, Hasegawa H, Tanaka T., SIMD parallel sparse matrix-vector and transposed-matrix-vector multiplication in DD precision, Proc. of the VECPAR, pp. 21-34, (2016)
  • [8] Wan SP., An efficient implementation of Manacher’s algorithm, CoRR, (2020)
  • [9] Song SX, Zhu H, Wang JM., Constraint-variance tolerant data repairing, Proc. of the SIGMOD Conf, pp. 877-892, (2016)
  • [10] Ester M, Kriegel HP, Sander J, Xu XW., A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. of the KDD, pp. 226-231, (1996)