Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

被引:161
|
作者
Rakthanmanon, Thanawin [1 ,2 ]
Campana, Bilson [3 ]
Mueen, Abdullah [3 ]
Batista, Gustavo [4 ]
Westover, Brandon [5 ]
Zhu, Qiang [3 ]
Zakaria, Jesin [3 ]
Keogh, Eamonn [3 ]
机构
[1] Univ Calif Riverside, Riverside, CA 92521 USA
[2] Kasetsart Univ, Dept Comp Engn, Bangkok, Thailand
[3] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[4] Univ Sao Paulo, Inst Ciencias Matemat & Comp, BR-05508 Sao Paulo, Brazil
[5] Brigham & Womens Hosp, Boston, MA 02115 USA
基金
巴西圣保罗研究基金会; 美国国家科学基金会;
关键词
Algorithms; Experimentation; Time series; similarity search; lower bounds; SIMILARITY SEARCH; ALGORITHMS;
D O I
10.1145/2500489
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif discovery, anomaly detection, and so on. The difficulty of scaling a search to large datasets explains to a great extent why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine massive time series for the first time. We demonstrate the following unintuitive fact: in large datasets we can exactly search under Dynamic Time Warping (DTW) much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We explain how our ideas allow us to solve higher-level time series data mining problems such as motif discovery and clustering at scales that would otherwise be untenable. Moreover, we show how our ideas allow us to efficiently support the uniform scaling distance measure, a measure whose utility seems to be underappreciated, but which we demonstrate here. In addition to mining massive datasets with up to one trillion datapoints, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] On-line and dynamic time warping for time series data mining
    Hailin Li
    [J]. International Journal of Machine Learning and Cybernetics, 2015, 6 : 145 - 153
  • [2] On-line and dynamic time warping for time series data mining
    Li, Hailin
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (01) : 145 - 153
  • [3] Parallelization of Searching and Mining Time Series Data using Dynamic Time Warping
    Shabib, Ahmed
    Narang, Anish
    Niddodi, Chaitra Prasad
    Das, Madhura
    Pradeep, Rachita
    Shenoy, Varun
    Auradkar, Prafullata
    Vignesh, T. S.
    Sitaram, Dinkar
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 343 - 348
  • [4] Time works well: Dynamic time warping based on time weighting for time series data mining
    Li, Hailin
    [J]. INFORMATION SCIENCES, 2021, 547 : 592 - 608
  • [5] Dynamic time warping based on cubic spline interpolation for time series data mining
    Li, Hailin
    Wan, Xiaoji
    Liang, Ye
    Gao, Shile
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 19 - 26
  • [6] Optimizing dynamic time warping’s window width for time series data mining applications
    Hoang Anh Dau
    Diego Furtado Silva
    François Petitjean
    Germain Forestier
    Anthony Bagnall
    Abdullah Mueen
    Eamonn Keogh
    [J]. Data Mining and Knowledge Discovery, 2018, 32 : 1074 - 1120
  • [7] A local segmented dynamic time warping distance measure algorithm for time series data mining
    Dong, Xiao-Li
    Gu, Cheng-Kui
    Wang, Zheng-Ou
    [J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1247 - +
  • [8] Optimizing dynamic time warping's window width for time series data mining applications
    Hoang Anh Dau
    Silva, Diego Furtado
    Petitjean, Francois
    Forestier, Germain
    Bagnall, Anthony
    Mueen, Abdullah
    Keogh, Eamonn
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (04) : 1074 - 1120
  • [9] Dynamic Time Warping of Segmented Time Series
    Banko, Zoltan
    Abonyi, Janos
    [J]. SOFT COMPUTING IN INDUSTRIAL APPLICATIONS - ALGORITHMS, INTEGRATION, AND SUCCESS STORIES, 2010, 75 : 117 - 125
  • [10] Weighted Dynamic Time Warping for Time Series
    Yang, Guangyu
    Xia, Shuyan
    [J]. INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2023, 33 (13):